Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

running Perl scripts in the background

by Anonymous Monk
on Jul 12, 2007 at 06:30 UTC ( #626143=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Good morning fellow monks
I have a perl script that searches large files to find some entries that interest me.
These files are rather large, so my program lasts for about 5-6 days.
I run the script by entering '&' in the command line and I store my results in the output file.
I was wondering If there is a way of keeping track of the program, so that when, for example, my pc shuts down for some reason (e.g. problem with electricity), the program will continue where it left off
. Thank you

Replies are listed 'Best First'.
Re: running Perl scripts in the background
by BrowserUk (Pope) on Jul 12, 2007 at 07:51 UTC

    If you put the list of files in a file, and read that file backwards, and truncate it after you've finished processing each file you read from it, then when you re-start the script, it will pick up the same file you were processing when you crashed.

    If you truncate the file just after reading a filename, remember the position of the new EOF, then you could write your current position in the data file at that position in the list file after each read of the data file. A small tweak to check for a position when you read the list file and a you can seek to that position when you re-open it after a re-start.

    Something like (untested):

    use File::ReadBackwards; my $listname = shift; ## Not sure whether F::RB allows you to specify an open mode? tie *LIST, 'File::ReadBackwards', $listname or die $!; while( <LIST> ) { my( $datafile, $position ) = split ' '; my $writePos = tell LIST; truncate LIST, writePos; open DATA, '<', $datafile or die $!; seek DATA, $position, 0 if defined $position; while( <DATA> ) { seek LIST, $writePos, 0; print LIST ' ', tell( DATA ), "\n"; ## Do stuff with the line from DATA in $_ } close DATA; seek LIST, $writepos, 0; truncate LIST, $writePos; } close LIST; ## Should be empty by the time we get here. unlink $listname;

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: running Perl scripts in the background
by sgt (Deacon) on Jul 12, 2007 at 07:16 UTC

    Well if you process things in some order, you could organize the work in traceable (log, dbm, whatever) stages or steps, so that you know exactly at which step you are. Each of these checkpoints should have a switch, so that you can call ' --checkpoint=12' for example after a power failure which has left you after stage 11.

    Of course if the power failure has left you with say 1/3 of stage 12 completed, then you would need some kind of rollback capability, or simply set up a smaller granurality for your steps. I would start with a first approximation maybe even cutting the script in a few pieces, you see the picture; it does not need to be perfect to be useful, if you cut down to pieces that last 1/2 day then you got something! ;)

    cheers --stephan
Re: running Perl scripts in the background
by hawtin (Prior) on Jul 12, 2007 at 08:51 UTC

    I have run a number of these 'monster' processes, for example last year I converted my complete MP3 collection to a different bitrate by resampling my FLAC files, the process took 3 weeks. My assumption is always that we will have a power-cut at the worst moment, so I normally control the process with a loop like (from memory):

    my $results_dir = "results"; mkdir $results_dir,0755 if(!-d $results_dir); foreach my $next_step (list_steps()) { next if(-r "$results_dir/$next_step.done"); $res = do_step($next_step); my $fh = IO::File->new(">$results_dir/$next_step.done"); print $fh $res; $fh->close(); } print "All steps done\n";

    Of course I normally add the date, some other interesting details to the results files and a lot more checking

Re: running Perl scripts in the background
by swampyankee (Parson) on Jul 12, 2007 at 12:16 UTC

    This is a real problem on machines executing long-running programs. You will have to keep some state information, such as which file the program was processing and the value of the file pointer.

    If this is done often, it may be worthwhile to a) reconsider your search algorithm or b) add hooks to the program or programs generating the files and write those interesting entries into an "interesting entry" file.


    Any New York City or Connecticut area jobs? I'm currently unemployed.

    There are some enterprises in which a careful disorderliness is the true method.

    —Herman Melville
Re: running Perl scripts in the background
by Anonymous Monk on Jul 12, 2007 at 06:46 UTC
    If your OS has a "hibernate" feature, and runs off of a batter backup (like a laptop), and goes to hibernate when the battery is about to fail, you're done.
      Unfortunatelly, there is no hibernate function... I run my script on the server of my school, with Linux on it...
Re: running Perl scripts in the background
by misc (Pilgrim) on Jul 12, 2007 at 11:36 UTC
    I could imagine Storable] is your friend.
    If you store your script's state in one object, it should be quite easy to save this object to disc and restore it later.

    Assuming your pc could die while writing to the file, it would also be useful to use some rotation system for storing your object.
    Perhaps it would be enough to sequentially store into two different files, syncing the filebuffers after writing.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://626143]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2017-05-28 06:14 GMT
Find Nodes?
    Voting Booth?