Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Best way to compare two files

by sparkel (Acolyte)
on Nov 01, 2004 at 13:27 UTC ( [id://404295]=perlquestion: print w/replies, xml ) Need Help??

sparkel has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

As per your reply, the file comparison should not compare line by line, but it should compare sentences that are different from File 2 to File 1:

Hence even if, "A big bad witch lived inside " is on different lines, it does not appear in the output

I am assuming that even if a line is repeated, since it has already appeared, it is ignored and not printed in the output

Please let me know what you suggest.

Thank you,
Lily

Thank you for your reply. As per your request, here are some samples:

File 1:
On October 31st 2004,
Hansel and Gratel went into the woods
They knocked on a big wooden door
A big bad witch lived inside

File 2:
This is November 1st, 2004
Red riding hood knocked on a big wooden door
A big bad witch lived inside
Oh wait, I think that was a big bad wolf who ate her grandmother

Hence, the file comparison (Finding what is different about File 2 from File 1) should ignore the lines that have dates on them. Hence, the output will be:

Red riding hood knocked on a big wooden door
Oh wait, I think that was a big bad wolf who ate her grandmother

Please suggest,
Thank you

I am a newbie to Perl, please be kind :)

I wanted to know what you felt was the best way I can compare two files.

Hence, except changes in dates (i.e. Ignoring lines that have dates like October 10th, October 31st etc.) , I want to grab all other different lines between the two files that I find.

Please suggest,

Thanks,
Lily

Replies are listed 'Best First'.
Re: Best way to compare two files
by sunadmn (Curate) on Nov 01, 2004 at 13:40 UTC
    I would first suggest taking a look at CPAN and then from there taking a lok at Array-Compare and if that doesn't suit your needs take a look at File-Compare . Also you may want to use the SuperSearch feature of this site as there are many posts on the matter.
    SUNADMN
    USE PERL
Re: Best way to compare two files
by Grygonos (Chaplain) on Nov 01, 2004 at 13:51 UTC

    We need a sample of the data since there are conditions on which we will not be checking diff's. However I would suggest something like the following.

    #!/perl use strict; use warnings; my @months = qw(January February March April May June July August September October November December); open(FIRST, "<first.file") or die "$!"; open(SECOND, "<second.file") or die "$!"; #if the files aren't huge you can slurp and comapre arrays my @first = <FIRST>; my @second = <SECOND>; close(FIRST); close(SECOND); open(DIFF, ">diff.file) or die "$!"; for(my $i = 0; $i < $#first; $i++) { my $is_date = 0; foreach(@months) { if($first[$i] =~ m{$_\s*\d{1,2}}) { $is_date = 1; last; } } if(!$is_date) { if($first[$i] ne $second[$i]) { print DIFF, "Difference found @ line: ".$i."\n"; print DIFF, "File 1: ".$first[$i]."\n"; print DIFF, "File 2: ".$second[$i]."\n"; print DIFF, "----------------------------\n"; } } } close(DIFF);
Re: Best way to compare two files
by zentara (Archbishop) on Nov 01, 2004 at 13:42 UTC
    You can use backticks to send the files to the separate gnu utility diff, and get the results, or you can use a Perl module like Text::Diff

    But you should be more specific in your question, like show us some sample code, or sample files.


    I'm not really a human, but I play one on earth. flash japh

      Text::Diff (and it's model, GNU diff) won't selectively ignore things; the poster needs to compare, but ignore differences in date. Unfortunately, a simple diff won't accomplish this.

      One approach would be to load columnar data (date in one column) and ignore the date column while comparing the rest of the data using something like Array::Compare.

      radiantmatrix
      require General::Disclaimer;
      "Users are evil. All users are evil. Do not trust them. Perl specifically offers the -T switch because it knows users are evil." - japhy
        You are right, but I was thinking that the poster could do some post-processing of the diff output to discard the dates.

        I'm not really a human, but I play one on earth. flash japh
Re: Best way to compare two files
by Grygonos (Chaplain) on Nov 01, 2004 at 15:32 UTC

    Your example is actually incorrect. Because "A big bad witch lived inside" is what you expect to be seen as the same. However, they are on different lines, and thus not the same. Your output can only be true if no line will be repeated and you scan each file for each line in the other. ie

    while(<ONE>) { my $line = $_; while(<TWO>) { print "different" if (!/$line/); } }
Re: Best way to compare two files
by Hena (Friar) on Nov 01, 2004 at 14:11 UTC
    If you are on *nix, I suggest trying 'diff -u file1 file2', although this will show the date changes as well. Also depending on intrest about whitespace, using -B or -b will reduce noise and -i will make diff case-insensetive.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://404295]
Approved by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-24 18:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found