Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Fast Recall

by sans-clue (Beadle)
on Sep 03, 2010 at 01:31 UTC ( [id://858655]=perlquestion: print w/replies, xml ) Need Help??

sans-clue has asked for the wisdom of the Perl Monks concerning the following question:

I have a perl script that gracious monks help me create that 'watches' a flat file without locking it and performs an action on each line written <CRLF> to the file. It has served me well. Now a new file is writing about 6 lines per second, which the script can handle, but the new requirement is to remember past 'lines'. The file is populated with the status of Big Iron jobs. Mainly Successfully completed jobs. Successful jobs can be ignored IF the Job wasn't previous Failed. If a job is failed, the job name must be plucked and stored somehow. Each Success Job needs to query the failed job store before it can be discarded. Success Jobs that are found in the 'store' are acted upon to clear failed statuses previously acted upon and thus need to be purged from the 'store'. Since I have limited access to this box (I had to beg to put the perl code on), I can't use a db to do this with, so I was wondering what would be my most efficient option to 'store' failed job names. The failed jobs would need to be purged in 8 hours (if success never occurs), since Job names are used again (daily). Thanks

Replies are listed 'Best First'.
Re: Fast Recall
by CountZero (Bishop) on Sep 03, 2010 at 06:02 UTC
    I can't use a db to do this
    Yes of course you can!

    Perl has this wonderful built-in database system, called ... hashes.

    And you can persist the datastore by binding your hash to a file with dbmopen.

    use strict; use warnings; use 5.012; # Add some data to the datastore my %failed_jobs; dbmopen %failed_jobs, './failed_jobs', 0666; $failed_jobs{'job_1234'} = 1; dbmclose %failed_jobs; # Now see if the data persisted my %check_failures; dbmopen %check_failures, './failed_jobs', 0666; for my $job (keys %check_failures) { say "$job = $check_failures{$job}"; } dbmclose %failed_jobs;
    The beauty of this is that it is totally transparent. Once you bind the hash to the database with dbmopen you keep using the hash as usual but Perl takes care of storing and fetching the data to/from storage.

    For a more sophisticated version of this mechanism, check out the tie function.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Fast Recall
by BrowserUk (Patriarch) on Sep 03, 2010 at 03:26 UTC

    Perhaps the very simplest mechanism would be to write an empty file with the failing IDs as filenames into a local directory when you read them.

    Each time you read a successful ID, you can use -e to see if it has had a previous failure. If it has, you can now unlink that file.

    Later, you can then use the creation timestamps on the files to delete any that are more than 8 hours old. This could even be done by a separate process that scans the directory on a regular basis via cron.

    It's simple, persistant, requires nothing beyond Perl itself, and should be easily fast enough to cater for 6 lookups per second.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      This should indeed keep up just fine, and it has the advantage that this is probably more understandable to the Big Iron folks, as it's very similar to the kind of thing I used to use when writing TSO scripts back when. It was faster to alloc and delete files than it was to allocate, open, write, and close repeatedly.
Re: Fast Recall
by GrandFather (Saint) on Sep 03, 2010 at 02:01 UTC

    "limited access ... can't use ..." probably isn't a reason. See Yes, even you can use CPAN.

    Otherwise you could write a simple storage system using fixed size blobs in one file and possibly an index file. The fixed size blobs allow for efficient reuse of blocks and the separate index file allows a mapping between a key and the actual data - sort of like a hash.

    True laziness is hard work
Re: Fast Recall
by ikegami (Patriarch) on Sep 03, 2010 at 04:17 UTC
    I'm having a hard time finding the question. Caching 6 lines?
    my @cache = (undef) x 6; while (<>) { ... shift @cache; push @cache, $_; }
Re: Fast Recall
by sundialsvc4 (Abbot) on Sep 03, 2010 at 14:25 UTC

    You can also tie() a hash to a persistent data store such as a Berkely-DB file.   You use hash-syntax in your program, but the data store is persistent.   BDB is a fast-and-efficient VSAM implementation.

    Various operating system interfaces exist to notify you when a file has changed.   But as I am reading this, I find myself thinking ... “isn’t this the perfect spot to use a pipe?”   (Yes, even Win32 has them.)

    Ahh, yes.   TSO...   I remember it, but not “well.”

Re: Fast Recall
by stevieb (Canon) on Sep 03, 2010 at 01:54 UTC

    ...no help to be found in this post... just formatting to prevent eye bleeding...

    I have a perl script that gracious monks help me create that 'watches' a flat file without locking it and performs an action on each line written <CRLF> to the file. It has served me well.

    Now a new file is writing about 6 lines per second, which the script can handle, but the new requirement is to remember past 'lines'.

    The file is populated with the status of Big Iron jobs. Mainly Successfully completed jobs.

    Successful jobs can be ignored IF the Job wasn't previous Failed. If a job is failed, the job name must be plucked and stored somehow.

    Each Success Job needs to query the failed job store before it can be discarded.

    Success Jobs that are found in the 'store' are acted upon to clear failed statuses previously acted upon and thus need to be purged from the 'store'.

    Since I have limited access to this box (I had to beg to put the perl code on), I can't use a db to do this with, so I was wondering what would be my most efficient option to 'store' failed job names. The failed jobs would need to be purged in 8 hours (if success never occurs), since Job names are used again (daily).

    Thanks

    ...no help, just formatting for readability.

    OP: README

      So it sounds like you just need some "lightweight" persistent storage for these failed jobs, where you can find some failed job name in the failed job list easily (and I guess probably containing some small amount of other info). Sounds like a hash structure.

      One idea is to just implement this with Config::Tiny. This is a module to fiddle with .ini files. This is a simple critter with no installation dependencies. So no fancy install is required, you could just put it in same directory as your new improved script.

      Anyway this thing implements HOH, for you to use as you see fit. Some very simple app code is attached.

      Update: OOps for posting at wrong level (the tidied up version instead of the original node..Ooops). Using a hash table may be overkill for just keeping track of "name => date/time", the idea BrowserUk would serve that purpose fine. I think there are other tied hash modules, but this was the simplest one that I could think of and avoids any potential installation hassles.

Re: Fast Recall
by TomDLux (Vicar) on Sep 03, 2010 at 14:30 UTC

    You have limited access to this box, so presumably you have more access to other boxes. Run a lightweight database on another box, and connect to it from the Big Iron box.

    Although the immediate need is for only a few hours worth of data, you may find it useful to accumulate an archive or performance results, assuming it does not consume too much space. That would allow you to graph performance trends and detect patterns, avoid problems.

    As Occam said: Entia non sunt multiplicanda praeter necessitatem.

      Thanks for the advice, a simple hash seems to scale fine so far.
Re: Fast Recall
by mr_mischief (Monsignor) on Sep 04, 2010 at 04:24 UTC

    Is this a long-running process or something run on a scheduler?

    If it's a long-running process and the stores fit in memory, I'd have a failure hash keyed on job number. I'd check for existence in that hash whenever I got a success to see if I needed to clear it. Every 100 lines or so I'd check to see which failed jobs need to be cleared. You don't really say which parts of this need to be output, but I'd guess the most important would be which job IDs are being cleared after eight hours of repeated failure.

    If your process gets run repeatedly by a scheduler like cron, then you need to store your data keyed on job ID outside your program. This can be done using SQL (connected to a remote SQL RDBMS or using something like DBD::SQLite or DBD::CSV if you really can't have a database on the box itself). It can be done using an XML store. It can be done using Storable. It could be done using a directory full of files, one for each failed job. It could be done using gdbm, ndbm, or Berkeley db.

    For a long-running process, I'd probably approach it in a way similar to this and see how it scales (untested):

    my @hours; while ( <$file> ) { my $hour = ( localtime )[ 2 ]; if ( /(job-id-format) failed/ ) { if ( ! exists $failed{ $1 } ) { push @{ $hours[ $hour ] }, $1; failed{ $1 } = 1; } report_temporary_job_failure( $1 ); } elsif ( /(job-id-format) succeeded/ ) { delete $failed{ $1 } if exists $failed{ $1 }; report_previously_failed_job_success( $1 ); } if ( 0 == $. % 100 ) { my $purge = ( $hour - 8 >= 0 ) ? $hour - 8 : $hour + 24 - 8; while ( $job = shift @{ $hours[ $purge ] } ) { report_final_job_failure( $job ); delete $failed{ $job }; } } }

    I'm pretty sure you can get six lines a second or far more from that. I've put PC-class machines through data gathering tasks on logs that grow multiple orders of magnitude faster than that.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://858655]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-04-18 21:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found