Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: Multithreaded Script CPU Usage

by Zenshai (Sexton)
on Aug 26, 2008 at 13:52 UTC ( #706905=note: print w/replies, xml ) Need Help??

in reply to Multithreaded Script CPU Usage

Thanks, everyone.

I think what I am going to do is split this script in 2, the threaded script will only launch the scans, and I'll have another script, on another box monitor the directory where the temp files are placed, process them, import them, and delete them.

Hopefully the memory leak BrowserUk describe won't get bad enough to kill my Perl process. Otherwise, expect another long winded post from me about forking or somesuch. :)

Replies are listed 'Best First'.
Re^2: Multithreaded Script CPU Usage
by Illuminatus (Curate) on Aug 26, 2008 at 15:48 UTC
    Just a few comments:
    1) You should not need 2 scripts. If your initial script did not create threads, but instead began scans using 'system' and putting the process in background, then the same script can monitor the output files and start new FileList executions as previous ones end.
    2) There is still the issue of what the 2 mysql functions really do. You might post the 'fix-the-file' function and the 'load data' function code if they are not too large. Either of these could also be taking lots of CPU. Does your mysql_load_data create bulk inserts or individual inserts? bulk inserts are much faster, both for the client and the server
      From what I can see, the functions you listed should be fine. My point about the single script went back to the initial problem. Your initial problem was that the main perl script was taking 75% CPU. As was pointed out, this was likely caused by the thread module itself. If you modified your script as I suggested, the perl script should end up using very little CPU, and could therefore run fine together on a single system. Of course, your mileage may vary...
        You were absolutely right. I appolologize for not taking your advice sooner.
      The reason I am going to try spliting the script across 2 machines is because I was thinking the same thing as Perlbotics above. There is just too much load on the resources if I try to do everything at the same time. By splitting the global task in 2 parts I can either run both concurrently on 2 machines or in sequence on 1, depending on my time constraints.

      But just in case, here is the rest of the code you were asking for. Am I doing something horribly wrong there that is causing needless load on the CPU?

      The Load data function just executes this sql:
      my $sql = qq{ LOAD DATA LOCAL INFILE \"$mysql_tmp_filename\" INTO TABLE `$tblname` FIELDS TERMINATED BY ',' OPTIONALLY enclosed by '\"' ESCAPED BY '\\\\' LINES TERMINATED BY '\\r\\n' IGNORE 3 LINES (file_name,\@file_p,file_size,\@file_la,\@file_lc,\@file_c, file_exten +sion) set file_last_access = STR_TO_DATE(\@file_la, '%c/%e/%Y %l:%i %p'), file_path = \@file_p, file_share_name = substring_index(substring_index(\@file_p,'\\\\',3),' +\\\\',-1), file_last_change = STR_TO_DATE(\@file_lc, '%c/%e/%Y %l:%i %p'), file_creation = STR_TO_DATE(\@file_c, '%c/%e/%Y %l:%i %p') }; my $sth = $dbh->prepare($sql); $sth->execute();
      and here is fix_file_4mysql:
      sub fix_file_4mysql { #arguments: 1) path to file #returns: 1) the filename fixed for use in a mysql query #fix the contents of the temp file for use in mysql my $tmp_filename = shift; open( IN, "+<$tmp_filename" ); my @file = <IN>; seek IN, 0, 0; foreach my $file (@file) { $file =~ s/\\/\\\\/g; print IN $file; } close IN; #fix temp_filename for use in mysql my $mysql_tmp_filename = $tmp_filename; $mysql_tmp_filename =~ s/\\/\\\\/g; return $mysql_tmp_filename; }
Re^2: Multithreaded Script CPU Usage
by Zenshai (Sexton) on Aug 27, 2008 at 01:42 UTC
    Well, there goes that theory.

    I remade the script as I specified above, and while its working exactly as intended, the original problem still remains. Perl CPU usage is still too high when it should be idle.

    I've found that I can give the FileList processes some more CPU room by setting the Perl process's priority to Below Normal or Low, but that's not really solving the problem. Its still needlessly using resources that can be better spent doing other things, or even being idle.

    I'm going to think about some other way to do this, without threads.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://706905]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2018-04-21 20:30 GMT
Find Nodes?
    Voting Booth?