Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Trying to identify unique lines in a log file

by aditya1977 (Novice)
on Mar 05, 2015 at 10:21 UTC ( #1118867=perlquestion: print w/replies, xml ) Need Help??

aditya1977 has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a program that will parse a log file and write data to a MySQL database based on the content of the log file.

The script will run via cron and I want to make sure I skip lines that have been parsed before.

One idea I had was to base 64 encode the line and make this a unique key in the database. Then the next time the script runs, if there is a matching key, skip the line.

What I don't like about this is that the encoded line is quite a long string, which would make viewing the database difficult to read. Is there a way to encode a long string to a relatively short string string?

Or perhaps there is a better way to do this?

  • Comment on Trying to identify unique lines in a log file

Replies are listed 'Best First'.
Re: Trying to identify unique lines in a log file
by choroba (Cardinal) on Mar 05, 2015 at 10:39 UTC
    For shorter strings, see Digest::MD5 and similar. Also, if the log grows on one end only, it should be enough to just remember the last line.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Thanks! You made me realise that there's no reason not to use hashing rather than encoding.

      And MD5 hashes are significantly shorter.

      Timestamp entries are present, but not unique as some events take place milliseconds apart.

Re: Trying to identify unique lines in a log file
by Theodore (Friar) on Mar 05, 2015 at 10:37 UTC
    Is there any timestamp in the log? If not, can you add one? Or even, Why not cycle the log first (copy and truncate) and then parse the "static" recycled log?
Re: Trying to identify unique lines in a log file
by GotToBTru (Prior) on Mar 05, 2015 at 19:45 UTC

    To reprocess the log file each time seems a terrible waste of time and resources, not to mention storing your ever-growing index of lines you have already logged! Use seek() and tell(). You need only store a bookmark and you will know where to start from next time. This will need to be coordinated with whatever process truncates your log file, of course.

    Dum Spiro Spero

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1118867]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (2)
As of 2023-06-03 04:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How often do you go to conferences?






    Results (6 votes). Check out past polls.

    Notices?