Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

compare most recent file with second most recent file and output difference

by jjoseph8008 (Initiate)
on Nov 02, 2012 at 19:06 UTC ( #1002028=perlquestion: print w/ replies, xml ) Need Help??
jjoseph8008 has asked for the wisdom of the Perl Monks concerning the following question:

I have a directory of log files - audit_20121101_010000.csv (nov 1st), audit_20121102_010000.csv (nov 2nd), audit_20121103_010000.csv (nov 3rd), audit_20121104_010000.csv (nov 4th) etc. generated on a daily basis (windows box). I am very new to perl and would like to compare the most recent file with the second most recent file on a daily basis and saved it to a third file (daily basis). I am only interested in comparing the two most recent files. I can delete older files, if necessary. I was wondering if someone can help me out please. Example (most recent file) - audit_20121104_010000.csv (nov 4th) contains nov1 bobjoe mary <new line> nov2 maryty <new line> nov3 joe doe (second most recent file) - audit_20121103_010000.csv (nov 3rd) contains nov1 bobjoe mary <new line> nov2 maryty <new line> The difference would be: nov3 joe doe

Comment on compare most recent file with second most recent file and output difference
Re: compare most recent file with second most recent file and output difference
by bart (Canon) on Nov 02, 2012 at 19:41 UTC
    You can get a list of the files this way:
    chdir $logdir; @files = glob 'audit_*_010000.csv';
    They probably should be in sorted order, but if occasionally that's not the case, you can do:
    @files = sort glob 'audit_*_010000.csv';
    Now files[-1] will be the name of the most recent file and $files[-2] the one of the previous file, assuming @files >= 2. So now, you can compare them.
Re: compare most recent file with second most recent file and output difference
by Kenosis (Priest) on Nov 02, 2012 at 19:46 UTC

    Here's one option:

    use strict; use warnings; use File::Slurp qw/read_file/; my ( %file1Hash, %file2Hash, %mergedHash ); my @files = sort { -M $a <=> -M $b } <"*.txt">; do { chomp; $file1Hash{$_}++ } for read_file $files[0]; $mergedHash{$_}++ for keys %file1Hash; do { chomp; $file2Hash{$_}++ } for read_file $files[1]; $mergedHash{$_}++ for keys %file2Hash; print "$_\n" for grep $mergedHash{$_} == 1, keys %mergedHash;

    What it does:

    do { chomp; $file1Hash{$_}++ } for read_file $files[0]; ^ ^ ^ ^ | | | | | | | + - Read the file, returning a +list | | + - Do this for each line | + - Make the line a hash key and increment the asso +ciated value + - Remove the newline $mergedHash{$_}++ for keys %file1Hash; ^ ^ | | | + - Do this for each key + - Make the line a hash key and increment the associated value print "$_\n" for grep $mergedHash{$_} == 1, keys %mergedHash; ^ ^ | | | + - If it only appears once (in + either file, but not both) + - Print the line

    This uses a hash to tally identical lines, then shows only those keys (lines) whose value is 1, i.e., lines which only appear once in either file.

    Hope this helps!

    Update: Used three hashes: one for each file, and one merging the two hashes, in case the same line is repeated twice within a file--and those lines are only in that one file.

      Thanks...I was trying out the solution kenosis provided..I installed file-slurp in the package manager and have my two files in c:\test\ The slurp.pm is in c:\perl64\site\lib\file. I ran the perl script from c:\test\ and got the below error Use of uninitialized value $file_name in -e at C:/Perl64/site/lib/File/Slurp.pm line 116. Use of uninitialized value $file_name in sysopen at C:/Perl64/site/lib/File/Slur p.pm line 193. Use of uninitialized value $file_name in concatenation (.) or string at C:/Perl6 4/site/lib/File/Slurp.pm line 194. read_file '' - sysopen: No such file or directory at c:\test\test_perl.pl line 9. i am more of a batch script person and not into much of perl. please advise.

        Hi, jjoseph8008

        I'm not sure what's causing the problem within File::Slurp, so please try the following which doesn't use that Module:

        use strict; use warnings; my ( %file1Hash, %file2Hash, %mergedHash ); my @files = sort { -M $a <=> -M $b } <"*.txt">; do { chomp; $file1Hash{$_}++ } for getFileLines($files[0]); $mergedHash{$_}++ for keys %file1Hash; do { chomp; $file2Hash{$_}++ } for getFileLines($files[1]); $mergedHash{$_}++ for keys %file2Hash; print "$_\n" for grep $mergedHash{$_} == 1, keys %mergedHash; sub getFileLines { open my $fh, '<', $_[0] or die $!; return <$fh>; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1002028]
Approved by Kenosis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2014-09-23 20:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (241 votes), past polls