Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re: Multithreaded Script CPU Usage

by Illuminatus (Curate)
on Aug 25, 2008 at 15:36 UTC ( #706705=note: print w/replies, xml ) Need Help??

in reply to Multithreaded Script CPU Usage

I think I need a little more info about your program structure. It looks like new threads invoke 'do_handler', but it is not clear how this, in turn, invokes and manages FileList. I would not so much worry about perl's memory usage -- 125M is not all that much. Much more troubling is the 75% CPU. If FileList is supposed to be doing most of the work, and the perl part is simply 'glue' taking output from FileList and inserting it into MySQL, then it should not be doing much. If it is not too large, could you post the do_handler, and what you are doing to put stuff into MySQL?

Replies are listed 'Best First'.
Re^2: Multithreaded Script CPU Usage
by Zenshai (Sexton) on Aug 25, 2008 at 17:11 UTC
    Illuminatus, yes that is in fact what I am mostly worried about. In fact I think the Mem Usage here is due to one of the threads doing a regex replace, adding backslashes to prepare a file to be imported into mysql.

    Edit: Just to clarify, I know that the CPU use should also be high when doing a regex on a large file with many matches, however, the CPU usage of the perl.exe process is always high, even when no regex operations are being performed.

    My do_handler function just routes the work to a function in the separate Worker module (I couldn't figure out the syntax to call it directly)

    Here's the relevant code from the Worker module:
    package File::FileListerWorker; sub main_run { # arguments: 1) job id from Pooler 2) path to scan 3) Path to FileLi +st.exe 4)table name # returns: 1) return 1; #### - START: MAIN EXECUTABLE CODEBLOCK - #### my $jobid = shift; # read scanpaths from the ARGs my $scanpaths_line = shift; # get path to scan my $fl_path = shift; # define path to FileList.exe my $arg_tbl_name = shift; # table to load data into my $dbh = &mysql_conn(); # connect to mysql my $mysql_tablename = &mysql_create_table( $arg_tbl_name, $dbh ); +# create table (IF NOT EXISTS) my @ta = localtime(time); # get time my $time_string = $ta[4] . $ta[3] . ( $ta[5] + 1900 ) . "_" . $ta[2] . $ta[1] . $t +a[0]; # time string my $temp_filename = # make unique filename "\\\\directoryToPlaceTempFileIn\\" . $jobid . "_" . $time_string . ".csv"; &do_scan( $fl_path, $scanpaths_line, $temp_filename ); # call to s +ub that does the scan my $mysql_temp_filename = &fix_file_4mysql($temp_filename); # add +backslashes &mysql_load_data( $mysql_temp_filename, $mysql_tablename, $dbh ); +# mysql import file &cleanup($temp_filename); #delete temp file to preserve space &mysql_disc($dbh); # disconnect mysql #### - END: MAIN EXECUTABLE CODEBLOCK - #### return 1; } sub do_scan { #arguments: 1)location of FileList.exe 2)path to scan 3)output fi +le path #returns: none my $fl_path = shift; my $pth = shift; chomp($pth); my $temp_filename = shift; system(qq{$fl_path $pth >$temp_filename}); }

      Just an idea... what if everything is perfectly right that way? I am not sure how costly fix_file_4mysql() and mysql_load_data() are, but if the do_scan() execution is much faster than loading the DB, perl.exe is still digesting the output of previous runs of FileList.exe. So, you'll might observe only four FileList.exe jobs that are currently running while the output of several dozen previous runs is still processed.

      How big is max => $total_threads? The number of $temp_filename's currently existing might provide a rough estimation of how many workers are currently busy.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://706705]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2018-03-20 06:47 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (248 votes). Check out past polls.