Re: require sugesstions for writing a perl script
by bart (Canon) on Mar 06, 2013 at 11:44 UTC
|
So what is the cause of the slowness of each individual script? Is it CPU, disk I/O, or waiting for external resources (for example, waiting for a file download) that takes most time?
If it's either CPU, harddisk access or database access that causes the slowness, then I would recommend against doing hem in parallel. 2 disk simultaneous hard disk accesses on the same disk will actually be slower than doing them one at a time, because the disk head has to constantly switch between the head positions for the 2 files. Likewise, doing 2 CPU intensive processes in parallel on the same CPU will not be faster than doing them one at a time, it'll only use more RAM.
If you're waiting for a file download to complete, you could have it do a few at a time. Also, if it's a combination of the above factors, you could get a speed gain doing them in parallel, for example one process could be accessing the disk while another is doing a computation.
Thus: do a benchmark test, limit the number of parallel processes, and see if it is actually faster, or not. | [reply] [Watch: Dir/Any] |
Re: require sugesstions for writing a perl script
by roboticus (Chancellor) on Mar 06, 2013 at 11:12 UTC
|
lazydev:
I've not tried it, but you could probably use Parallel::ForkManager to control the number of jobs running at once. As the anonymous monk mentioned earlier, you don't want to run all 1000 at once. Instead keep a pool of XX running until you finish them all. Play around with the value of XX until you find the best time/resource tradeoff.
...roboticus
When your only tool is a hammer, all problems look like your thumb.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
Re: require sugesstions for writing a perl script
by topher (Scribe) on Mar 06, 2013 at 15:56 UTC
|
I require suggestions for writing a perl script which reduces or uses less CPU & Memory utilization during execution of a program.
If you want suggestions for reducing the CPU or Memory of your program, you're going to have to provide a lot more details (and code) for your program. Right now, all we can do is make vague and general guesses and suggestions around the process.
As for those suggestions we can provide . . . there's a lot of ways this can potentially be improved. First of all, you need to review your validation script, and see if there are ways it can be made faster. There are lots of ways to profile your script, the most popular current tool being Devel::NYTProf. There are articles and posts out there that offer suggestions, too (such as http://stackoverflow.com/questions/4371714/how-do-i-profile-my-perl-programs).
Another thing that could help would be to move your file finding/handling into your Perl script. Running it from the shell script means that you're starting the Perl interpreter 1000 separate times. Depending on what and how many modules you're loading, that can start to add up.
I know Parallel::ForkManager has already been suggested, and you'd be wise to investigate it. It will allow you to easily process multiple files simultaneously in a more control manner. Finding the right number of concurrent processes for maximum performance may require some testing, though.
That's about all I can think of off the top of my head without more information on what validate.pl is doing, how big the files it's validating are, what format the files are, etc.
| [reply] [Watch: Dir/Any] |
Re: require sugesstions for writing a perl script
by Laurent_R (Canon) on Mar 06, 2013 at 19:01 UTC
|
I would definitely recommend that you launch one Perl process that will scan for the files and process them (forking or not forking, that's not the issue I have in mind here), rather than having a shell script launching Perl 1,000 times, meaning that you have to start the interpreter and compile your program each time.
I've had such a case at my job 8 or 9 years ago. I wrote a rather simple Perl script to reprocess a very large amount of input data. I thought the data would be coming as one big file or possibly a few big files, so my script was initially designed to process just one file. My colleague who was using my script came to me and told me: "it is awfully slow, it takes hours and hours, we can't use your script." I was surprised, because I knew my script was perfectly able to process the expected amount of data in a dozen minutes or so. After having looked with him at the problem, I figured out that the data was in fact coming in the form of tens (or possibly hundreds) of thousands of small files, so my colleague (who did not know Perl) had decided to write a shell script to launch my Perl script again and again for each incoming file. Now, the Perl script first had to load into memory a large parameter file (about 250,000 telephone numbers) before processing the input files. The result was that, each time, the script had to load this large parameter file for processing a small data file (which had each in the order of possibly 1,000 to 2,000 lines), which was of course utterly inefficient. I just changed my script so that it would be launched only once, load once the parameter file and then only process all the data files in the relevant directory. That worked perfectly well, we no longer had a performance problem, the processing of the data ran 60 or 70 times faster if I remember the figures right.
I am telling you this story for 2 reasons:
1. it is not efficient to launch Perl 1,000 times if you can do it differently.
2. Knowing exactly what the program is doing is of paramount importance. The time spent in loading the parameter file was almost totally irrelevant compared to the amount of data to be processed if done only once or a couple of times, but became a major bottleneck when it had to be done tens or hundreds of thousand times. So tell us what your validation script is doing.
| [reply] [Watch: Dir/Any] |
Re: require sugesstions for writing a perl script
by clueless newbie (Curate) on Mar 06, 2013 at 12:42 UTC
|
| [reply] [Watch: Dir/Any] |
Re: require sugesstions for writing a perl script
by Anonymous Monk on Mar 06, 2013 at 11:01 UTC
|
well, don't start a thousand, limit it to 4 or something | [reply] [Watch: Dir/Any] |
Re: require sugesstions for writing a perl script
by space_monk (Chaplain) on Mar 06, 2013 at 17:01 UTC
|
Other people have suggested ways to do your tasks in parallel without overloading the processor(s), but perhaps you are asking the wrong question.
One also might ask why it takes 10 secs to validate a file? You can do a lot in 10 secs nowadays! 1000 files doesn't really sound a lot.
Why doesn't validation.pl handle a list of files anyway?
i.e. so the command line accepts a list of files to be processed?
I'll also go out on a limb and suggest you're probably better not using bash at all, but modify validation.pl so it forks itself. I'm sure someone will say otherwise if this is a bad idea! :-)
A Monk aims to give answers to those who have none, and to learn from those who know more.
| [reply] [Watch: Dir/Any] |
Re: require sugesstions for writing a perl script
by Anonymous Monk on Mar 06, 2013 at 16:08 UTC
|
Why not try to run 50 processes and then sleep for 20 seconds and again run 50 processes, sleep 20 sec. and etc.
for i in 1 2 3 4 5 . . . . . . 1000
do
nohup validation.pl $i &
if [ $i%50=0 ]; then
sleep 20
fi
done
Regards, Pavel Petrov | [reply] [Watch: Dir/Any] [d/l] |
|
for i in 1 2 3 4 5 . . . . . . 1000
do
nohup validation.pl $i &
let j=($i%50)
if [ $j -eq 0 ]; then
sleep 20
fi
done
Regards, Pavel Petrov | [reply] [Watch: Dir/Any] [d/l] |
Re: require sugesstions for writing a perl script
by pvaldes (Chaplain) on Mar 06, 2013 at 19:16 UTC
|
Define "to validate a file" please... | [reply] [Watch: Dir/Any] |
Re: require sugesstions for writing a perl script
by sam_bakki (Pilgrim) on Mar 07, 2013 at 12:23 UTC
|
Hi
You can have a look at threads, Thread::Queue modules for parallel processing. But unless people know what exactly validation.pl is doing, it is difficult to suggest.
| [reply] [Watch: Dir/Any] [d/l] |