Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

compare most recent file with second most recent file and output difference

by jjoseph8008 (Initiate)
on Nov 02, 2012 at 19:06 UTC ( #1002028=perlquestion: print w/ replies, xml ) Need Help??
jjoseph8008 has asked for the wisdom of the Perl Monks concerning the following question:

I have a directory of log files - audit_20121101_010000.csv (nov 1st), audit_20121102_010000.csv (nov 2nd), audit_20121103_010000.csv (nov 3rd), audit_20121104_010000.csv (nov 4th) etc. generated on a daily basis (windows box). I am very new to perl and would like to compare the most recent file with the second most recent file on a daily basis and saved it to a third file (daily basis). I am only interested in comparing the two most recent files. I can delete older files, if necessary. I was wondering if someone can help me out please. Example (most recent file) - audit_20121104_010000.csv (nov 4th) contains nov1 bobjoe mary <new line> nov2 maryty <new line> nov3 joe doe (second most recent file) - audit_20121103_010000.csv (nov 3rd) contains nov1 bobjoe mary <new line> nov2 maryty <new line> The difference would be: nov3 joe doe

Comment on compare most recent file with second most recent file and output difference
Re: compare most recent file with second most recent file and output difference
by bart (Canon) on Nov 02, 2012 at 19:41 UTC
    You can get a list of the files this way:
    chdir $logdir; @files = glob 'audit_*_010000.csv';
    They probably should be in sorted order, but if occasionally that's not the case, you can do:
    @files = sort glob 'audit_*_010000.csv';
    Now files[-1] will be the name of the most recent file and $files[-2] the one of the previous file, assuming @files >= 2. So now, you can compare them.
Re: compare most recent file with second most recent file and output difference
by Kenosis (Priest) on Nov 02, 2012 at 19:46 UTC

    Here's one option:

    use strict; use warnings; use File::Slurp qw/read_file/; my ( %file1Hash, %file2Hash, %mergedHash ); my @files = sort { -M $a <=> -M $b } <"*.txt">; do { chomp; $file1Hash{$_}++ } for read_file $files[0]; $mergedHash{$_}++ for keys %file1Hash; do { chomp; $file2Hash{$_}++ } for read_file $files[1]; $mergedHash{$_}++ for keys %file2Hash; print "$_\n" for grep $mergedHash{$_} == 1, keys %mergedHash;

    What it does:

    do { chomp; $file1Hash{$_}++ } for read_file $files[0]; ^ ^ ^ ^ | | | | | | | + - Read the file, returning a +list | | + - Do this for each line | + - Make the line a hash key and increment the asso +ciated value + - Remove the newline $mergedHash{$_}++ for keys %file1Hash; ^ ^ | | | + - Do this for each key + - Make the line a hash key and increment the associated value print "$_\n" for grep $mergedHash{$_} == 1, keys %mergedHash; ^ ^ | | | + - If it only appears once (in + either file, but not both) + - Print the line

    This uses a hash to tally identical lines, then shows only those keys (lines) whose value is 1, i.e., lines which only appear once in either file.

    Hope this helps!

    Update: Used three hashes: one for each file, and one merging the two hashes, in case the same line is repeated twice within a file--and those lines are only in that one file.

      Thanks...I was trying out the solution kenosis provided..I installed file-slurp in the package manager and have my two files in c:\test\ The slurp.pm is in c:\perl64\site\lib\file. I ran the perl script from c:\test\ and got the below error Use of uninitialized value $file_name in -e at C:/Perl64/site/lib/File/Slurp.pm line 116. Use of uninitialized value $file_name in sysopen at C:/Perl64/site/lib/File/Slur p.pm line 193. Use of uninitialized value $file_name in concatenation (.) or string at C:/Perl6 4/site/lib/File/Slurp.pm line 194. read_file '' - sysopen: No such file or directory at c:\test\test_perl.pl line 9. i am more of a batch script person and not into much of perl. please advise.

        Hi, jjoseph8008

        I'm not sure what's causing the problem within File::Slurp, so please try the following which doesn't use that Module:

        use strict; use warnings; my ( %file1Hash, %file2Hash, %mergedHash ); my @files = sort { -M $a <=> -M $b } <"*.txt">; do { chomp; $file1Hash{$_}++ } for getFileLines($files[0]); $mergedHash{$_}++ for keys %file1Hash; do { chomp; $file2Hash{$_}++ } for getFileLines($files[1]); $mergedHash{$_}++ for keys %file2Hash; print "$_\n" for grep $mergedHash{$_} == 1, keys %mergedHash; sub getFileLines { open my $fh, '<', $_[0] or die $!; return <$fh>; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1002028]
Approved by Kenosis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2014-10-25 07:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (142 votes), past polls