Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

require sugesstions for writing a perl script

by lazydev (Initiate)
on Mar 06, 2013 at 11:00 UTC ( [id://1021991]=perlquestion: print w/replies, xml ) Need Help??

lazydev has asked for the wisdom of the Perl Monks concerning the following question:

I require suggestions for writing a perl script which reduces or uses less CPU & Memory utilization during execution of a program .

I have a validation.pl perl script which executes fine but this script has to validate around 1000 files daily for every 5 minutes. So I have written a simple shell script with a simple for loop

for i in 1 2 3 4 5 . . . . . . 1000 do validation.pl $i done

As it needs to validate 1000 files where each validation.pl executes for 10sec's and it's taking more time then 5 minutes as they are executed sequentially .

So I have changed the syntax to execute the validation script in background mode to run parallely (1000 processes related to validation.pl execute at the same time for a minute)

for i in 1 2 3 4 5 . . . . . . 1000 do nohup validation.pl $i & done

Now it executes all the processes parallely but it's causing high CPU & Memory Utilization .In simple the server crashes because of high cpu & memory utilization. As one processes takes around 0.1% cpu & memory utilization 1000 processes almost uses 100% of utilization .

Is there a better way where I can write the shell script in perl and it should execute the validation.pl in parallel for all the inputs with less cpu & memory utilization during execution .

Please let me know what would be the best way to start working on it .

Replies are listed 'Best First'.
Re: require sugesstions for writing a perl script
by bart (Canon) on Mar 06, 2013 at 11:44 UTC
    So what is the cause of the slowness of each individual script? Is it CPU, disk I/O, or waiting for external resources (for example, waiting for a file download) that takes most time?

    If it's either CPU, harddisk access or database access that causes the slowness, then I would recommend against doing hem in parallel. 2 disk simultaneous hard disk accesses on the same disk will actually be slower than doing them one at a time, because the disk head has to constantly switch between the head positions for the 2 files. Likewise, doing 2 CPU intensive processes in parallel on the same CPU will not be faster than doing them one at a time, it'll only use more RAM.

    If you're waiting for a file download to complete, you could have it do a few at a time. Also, if it's a combination of the above factors, you could get a speed gain doing them in parallel, for example one process could be accessing the disk while another is doing a computation.

    Thus: do a benchmark test, limit the number of parallel processes, and see if it is actually faster, or not.

Re: require sugesstions for writing a perl script
by roboticus (Chancellor) on Mar 06, 2013 at 11:12 UTC

    lazydev:

    I've not tried it, but you could probably use Parallel::ForkManager to control the number of jobs running at once. As the anonymous monk mentioned earlier, you don't want to run all 1000 at once. Instead keep a pool of XX running until you finish them all. Play around with the value of XX until you find the best time/resource tradeoff.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Good suggestion. I've used Parallel::ForkManager, and it should indeed work very well for this case.

      Christopher Cashell
Re: require sugesstions for writing a perl script
by topher (Scribe) on Mar 06, 2013 at 15:56 UTC
    I require suggestions for writing a perl script which reduces or uses less CPU & Memory utilization during execution of a program.

    If you want suggestions for reducing the CPU or Memory of your program, you're going to have to provide a lot more details (and code) for your program. Right now, all we can do is make vague and general guesses and suggestions around the process.

    As for those suggestions we can provide . . . there's a lot of ways this can potentially be improved. First of all, you need to review your validation script, and see if there are ways it can be made faster. There are lots of ways to profile your script, the most popular current tool being Devel::NYTProf. There are articles and posts out there that offer suggestions, too (such as http://stackoverflow.com/questions/4371714/how-do-i-profile-my-perl-programs).

    Another thing that could help would be to move your file finding/handling into your Perl script. Running it from the shell script means that you're starting the Perl interpreter 1000 separate times. Depending on what and how many modules you're loading, that can start to add up.

    I know Parallel::ForkManager has already been suggested, and you'd be wise to investigate it. It will allow you to easily process multiple files simultaneously in a more control manner. Finding the right number of concurrent processes for maximum performance may require some testing, though.

    That's about all I can think of off the top of my head without more information on what validate.pl is doing, how big the files it's validating are, what format the files are, etc.

    Christopher Cashell
Re: require sugesstions for writing a perl script
by Laurent_R (Canon) on Mar 06, 2013 at 19:01 UTC

    I would definitely recommend that you launch one Perl process that will scan for the files and process them (forking or not forking, that's not the issue I have in mind here), rather than having a shell script launching Perl 1,000 times, meaning that you have to start the interpreter and compile your program each time.

    I've had such a case at my job 8 or 9 years ago. I wrote a rather simple Perl script to reprocess a very large amount of input data. I thought the data would be coming as one big file or possibly a few big files, so my script was initially designed to process just one file. My colleague who was using my script came to me and told me: "it is awfully slow, it takes hours and hours, we can't use your script." I was surprised, because I knew my script was perfectly able to process the expected amount of data in a dozen minutes or so. After having looked with him at the problem, I figured out that the data was in fact coming in the form of tens (or possibly hundreds) of thousands of small files, so my colleague (who did not know Perl) had decided to write a shell script to launch my Perl script again and again for each incoming file. Now, the Perl script first had to load into memory a large parameter file (about 250,000 telephone numbers) before processing the input files. The result was that, each time, the script had to load this large parameter file for processing a small data file (which had each in the order of possibly 1,000 to 2,000 lines), which was of course utterly inefficient. I just changed my script so that it would be launched only once, load once the parameter file and then only process all the data files in the relevant directory. That worked perfectly well, we no longer had a performance problem, the processing of the data ran 60 or 70 times faster if I remember the figures right.

    I am telling you this story for 2 reasons:

    1. it is not efficient to launch Perl 1,000 times if you can do it differently.

    2. Knowing exactly what the program is doing is of paramount importance. The time spent in loading the parameter file was almost totally irrelevant compared to the amount of data to be processed if done only once or a couple of times, but became a major bottleneck when it had to be done tens or hundreds of thousand times. So tell us what your validation script is doing.

Re: require sugesstions for writing a perl script
by clueless newbie (Curate) on Mar 06, 2013 at 12:42 UTC

    Perhaps you could ignore the files that haven't changed?

Re: require sugesstions for writing a perl script
by Anonymous Monk on Mar 06, 2013 at 11:01 UTC
    well, don't start a thousand, limit it to 4 or something
Re: require sugesstions for writing a perl script
by space_monk (Chaplain) on Mar 06, 2013 at 17:01 UTC

    Other people have suggested ways to do your tasks in parallel without overloading the processor(s), but perhaps you are asking the wrong question.

    One also might ask why it takes 10 secs to validate a file? You can do a lot in 10 secs nowadays! 1000 files doesn't really sound a lot.

    Why doesn't validation.pl handle a list of files anyway? i.e. so the command line accepts a list of files to be processed?

    I'll also go out on a limb and suggest you're probably better not using bash at all, but modify validation.pl so it forks itself. I'm sure someone will say otherwise if this is a bad idea! :-)

    A Monk aims to give answers to those who have none, and to learn from those who know more.
Re: require sugesstions for writing a perl script
by Anonymous Monk on Mar 06, 2013 at 16:08 UTC

    Why not try to run 50 processes and then sleep for 20 seconds and again run 50 processes, sleep 20 sec. and etc.

    for i in 1 2 3 4 5 . . . . . . 1000 do nohup validation.pl $i & if [ $i%50=0 ]; then sleep 20 fi done

    Regards, Pavel Petrov

      My mistake in comparison (bash is tricky). Correction:

      for i in 1 2 3 4 5 . . . . . . 1000 do nohup validation.pl $i & let j=($i%50) if [ $j -eq 0 ]; then sleep 20 fi done

      Regards, Pavel Petrov

Re: require sugesstions for writing a perl script
by pvaldes (Chaplain) on Mar 06, 2013 at 19:16 UTC
    Define "to validate a file" please...
Re: require sugesstions for writing a perl script
by sam_bakki (Pilgrim) on Mar 07, 2013 at 12:23 UTC

    Hi

    You can have a look at threads, Thread::Queue modules for parallel processing. But unless people know what exactly validation.pl is doing, it is difficult to suggest.

    Thanks & Regards,
    Bakkiaraj M
    My Perl Gtk2 technology demo project - http://code.google.com/p/saaral-soft-search-spider/ , contributions are welcome.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1021991]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-03-19 04:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found