Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Writing to a file

by jalebie (Acolyte)
on Aug 20, 2001 at 23:26 UTC ( #106327=perlquestion: print w/replies, xml ) Need Help??

jalebie has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am launching a bunch of scripts in parallel as background process using a system command and trying to capture its output in a log file. Since the script is launched multiple times in parallel the output in the log file is getting overwritten. Is there a better wat to do this. Please help.
for ($i = 0; $i <=10; $i++) { system("myscript.prl $id >> $tmp_file &"); }


Replies are listed 'Best First'.
(Ovid) Re: Writing to a file
by Ovid (Cardinal) on Aug 20, 2001 at 23:35 UTC

    Well, it's tough to know exactly how to do that since we don't know much about the scripts and how they're writing to the file, but how about having them write to separate files and then cat them together when you're done?

    If you time and date stamp the log entries, you could write a perl program to sort and combine them for you.


    Vote for paco!

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      The problem with all your solutions is that want me to write to multiple files and combine them the only problem with that is that myscript.prl is actually being called locally on different wkstations when by
      system("rshto $wks myscript.prl >> $tmp_file");
      and we have over 200,000 wkstations here were the script is supposed to run. I thought about writing to different files too, but the sheer number of temp log files generated make this impractical, and the extra code to put these files back by date/time stamp and then unlink("$tmp_file") is also needed. I was wondering if there is a way in perl to know if the file is being written too currently, and if its is being wriiten to wait until no other process is writing to it.
        Within "myscript.prl" instead of just printing and capturing STDOUT to $tmp_file, open it instead and write to it. You will want to checkout flock which may help prevent the overwriting problem. Still, if you have hundreds of thousands of processes/machines all trying to write to the same file, you are creating a huge bottleneck. What about running the command on each machine as you appear to want to do, but write it to a local temporary accumulation file. Then either retrieve each one, or send them to a common queue (on a periodic basis) where a second process can collate them into this one behemoth file you desire? Just a thought.

        There is flock, which would lock the file. But, each process has to request what the flock status is and I'm not very conversant on how that works.

        Now, what you're saying is that you're going to run this script on separate workstations. Why not just run it, store the logfile locally, then have another script which gathers together all the data?

        /me wants to be the brightest bulb in the chandelier!

        Vote paco for President!

        Couldn't you utilize the users home directories for a location for the temp file, and then run one script to comb the homedirs and conglomerate them all into a master file?


Re: Writing to a file
by Dragonfly (Priest) on Aug 21, 2001 at 06:15 UTC
    Instead of using a flat log file and trying to deal with the locking/overwriting problems that entails, have you considered logging these into a Free, solid database engine that supports row-level locking such as PostgreSQL?

    This approach might also have the side benefit of letting you find a way around the possibility of running out of space in your process table. And, you could write simple modules that could then index the log files and sort them by date or machine or what-have-you afterwards.

    Probably not exactly what you're looking for, but it's a thought. =}

Re: Writing to a file
by Cine (Friar) on Aug 20, 2001 at 23:30 UTC
    output to $tmp_file$$ instead and then join the tempfiles afterwars if necessary.
    $$ is current pid, if you were unsure.

    Ups... $$ in this case is the same always :(
    system("perl -e 'myscript.prl $id >> $tmp_file\$\$' &");

    T I M T O W T D I
      This wouldn't work because $$ would be the same for every instance of the system call. Since the code is using a for loop, try using the counter variable to uniquely name the files:
      system("myscript.prl $id > $tmp_file$i");

      perl -l -e "eval pack('h*','072796e6470272f2c5f2c5166756279636b672');"

      The problem with that is that I am planning to run the for loop for over a 100,000 times easily, which would generate a 100,000 of temp files
        Cant you change your called script to use Sys::Syslog, that should solve your problem...

        T I M T O W T D I
        You are going to get a problem with open filehandles and other resources if you are planning on starting 100k processes at once...
        Perhaps you should just open 10-100 at a time and wait for them to finish and then continue...

        T I M T O W T D I

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://106327]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (1)
As of 2022-10-03 01:14 GMT
Find Nodes?
    Voting Booth?
    My preferred way to holiday/vacation is:

    Results (13 votes). Check out past polls.