Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

compare two files and print the differences of file 2 in a third file

by hopper (Novice)
on Jun 13, 2017 at 22:10 UTC ( [id://1192744]=perlquestion: print w/replies, xml ) Need Help??

hopper has asked for the wisdom of the Perl Monks concerning the following question:

I am new to Perl and trying to compare two files and print out the differences of file #2 base from file 1. In other words, I want the keep the file 1 and delete the duplicate of file 2. Each file has sections and in each section contents different information. I want to diff the files. If there are sections. Here is my code

File1.txt

SECTION, ONE, 1, 4, YELLOW, HIGH, THIS IS COMMENT This is my line This is my page SECTION, THREE, 9, 4, RED, HIGH, THIS IS COMMENT This is a dog This is a cat

File2.txt

SECTION, ONE, 1, 4, YELLOW, HIGH, THIS IS COMMENT This is a cat This is not a cat SECTION, TWO, 2, 4, BLUE, HIGH, THIS IS COMMENT This is not a book This is a notebook
my output result:
SECTION, TWO, 2, 4, BLUE, HIGH, THIS IS COMMENT
Output should be:
SECTION, TWO, 2, 4, BLUE, HIGH, THIS IS COMMENT This is not a book This is a notebook
Here is my code:
#!/bin/perl -w use strict; use warnings; use File::Copy; use Cwd; my $dir = cwd; main(); sub main { printf "\nStarting script\n"; printf "\nEnter the file 1: "; my $fh1 = <STDIN>; chomp $fh1; printf "\n"; printf "Enter the file 2: "; my $fh2 = <STDIN>; chomp $fh2; my $tempFile = "temp.txt"; my $nonMatch = "nonMatch.txt"; if(-e $fh1 and -e $fh2) { my %results = (); open (FILE1, "<$fh1") or die "Input file $fh1 not found.\n +"; while(my $line = < FILE1>) { if($line =~ /^Section/) { my ($sec, $first, $second, $third, $color, $mode, +$description_comments) = split(',', $line, 7); $results{$line}=1; } } close(FILE1); open (FILE, "<$fh2") or die "Input file $fh2 not found.\n" +; while(my $line = <>) { if($line =~ /^Section/) { my ($sec, $first, $second, $third, $color, $mode, +$description_comments) = split(',', $line, 7); $results{$line}++; } } close(FILE2); open (NONMATCH, ">$nonMatch") or die "Cannot open $nonMatc +h for writing \n"; foreach my $line (keys %results) { print NONMATCH " $results{$line} - $line" if $results{$li +ne} ==2; } close NONMATCH; } close FILE2; }

Replies are listed 'Best First'.
Re: compare two files and print the differences of file 2 in a third file
by robby_dobby (Hermit) on Jun 14, 2017 at 05:17 UTC
    Hello Lonnie,

    Welcome to the monastery! You're doing a few things that are just considered bad convention:

    • Don't use bare word file handles. Prefer lexical file handles instead. open(my $fh, "<", "/path/here/") or die "Can't open foo: ", $!;
    • Use 3-arg open. Please take a look at perlopentut and the example I quoted above. Using 2-arg open, leaves you vulnerable to security issues.
    • This is a minor issue - have you considered passing in arguments to your subroutine, instead of reading them from STDIN? Passing to subroutine makes it clear what it's doing, besides keeping the whole logic self-contained.

    Good on you for posting the code and I can see that you're doing a lot of work, which would just as easily achieved by using an available module such as File::Compare. File::Compare is bundled with standard perl distributions and should be available with your local installation. Sure, it doesn't print the exact differences which is what you wanted to do, but you can copy steal ideas from it, surely? (For example, instead of reading line by line, you can directly read into a buffer on some configurable buffer size using read or sysread functions?)

    As always, have fun! :-)

Re: compare two files and print the differences of file 2 in a third file
by kevbot (Vicar) on Jun 14, 2017 at 06:04 UTC
    Hello lonnie,

    I'm not sure that I completely understand your example. The data in GoodFile.txt and BadFile.txt differ on line 2, so I would expect that difference to show up in Output.txt. However, in your example it does not.

    Here is some code that uses Algorithm::Diff to calculate the diff, with some help from Path::Tiny to get the file contents.

    This code,
    #!/usr/bin/env perl use strict; use warnings; use Path::Tiny; use Algorithm::Diff qw/diff/; my @lines1 = path('GoodFile.txt')->lines; my @lines2 = path('BadFile.txt')->lines; my $diff = Algorithm::Diff->new( \@lines1, \@lines2 ); while( $diff->Next() ){ next if $diff->Same(); print $diff->Items(2), "\n"; } exit;
    gives the following output:
    This section is the description of the animal bla bla lbas.....bla bla + lbas..... Amimal, cat, 3, 4, YELLOW LEG 3, HIGH 'this is a cattt 1 This section is the description of the animal bla bla lbas.....bla bla + lbas blaaaal........

    UPDATE: The OP made edits to their node, and removed their input data. Here is the input data that I used for my code example (which the OP had provided in the first version of their post).

    GoodFile.txt
    Amimal, cat, 1, 4, YELLOW HAIR 3, HIGH 'this is a cattt 1 This section is the description of the animal bla bla lbas..... Amimal, dog, 2, 4, BLACK HEAD 1, HIGHf'this is a doggg 2 This section is the description of the animal bla bla lbas.....
    BadFile.txt
    Amimal, cat, 1, 4, YELLOW HAIR 3, HIGH 'this is a cattt 1 This section is the description of the animal bla bla lbas.....bla bla + lbas..... Amimal, cat, 3, 4, YELLOW LEG 3, HIGH 'this is a cattt 1 This section is the description of the animal bla bla lbas.....bla bla + lbas blaaaal........
      Thanks for taking time to take look at my code. I think you misunderstand my issues. The issues is that I have two files and each files has the multiple sections, and each section contents the description of it. The problem I am want to do is to compare two files and only print out the sections and subsection within that are not in file1 and save to the third file. Your edited code is comparing all the lines but I want to compare the sections and ignore the lines within the section. For example, file2: (Section, eighteen, 3, 4, YELLOW HAIR 3, HIGH, 'this is a cattt 1) has 7 strings) and if the line (7 strings) does match with the line in file 1 then I want to print out the no match information ( ex: Section, eighteen 3, 4, YELLOW HAIR 3, HIGH, 'this is a cattt 1, cccccccccccccc kkkkkkkkkkkkkk... So far, my code print out just (Section, eighteen 3, 4, YELLOW HAIR 3, HIGH, 'this is a cattt 1, ) it does not print out the subsections that belongs to the section, like cccccccccccccc.... . Please help me figure out what is the issues with my code. Thanks so much in advance.
        Hello lonnie,

        It appears that you edited your original post, and removed you original input data. This removed important context for some of the replies that you previously received. If you make an update to a post, it's less confusing if you add new information and keep the original information intact. You can do that by adding an update to the bottom of you original post. It makes it easier to identify an update if you annotate it in some way. For example,

        UPDATE: This is an update.

        Also, your original post now lacks proper formatting. Please use code tags like you did in your original post. If the format of your post is easier to read, you will likely receive more help.

Re: compare two files and print the differences of file 2 in a third file
by dbander (Scribe) on Jun 14, 2017 at 07:37 UTC

    Your problem is in this line:

    while ( my $line = < GOODFILE> ) {

    Try using printto see what you're getting in $line:

    while ( my $line = < GOODFILE> ) { chomp $line; print "GOODFILE line: '$line'\n";

    Your error should be clear to you if you do this.

      Thanks for taking time to take look at my code.
      I think you misunderstand my issues. The issues is that I have two files and each files has the multiple sections, and each section contents the description of it. The problem I am want to do is to compare two files and only print out the sections and subsection within that are not in file1 and save to the third file. So far, my code print out the diff(file2)of the sections that are not in file1 but it does not print out the subsections that belongs to the section.

      Please help me figure out what is the issues with my code.
      Thanks so much in advance.

        Your issue is that you are not reading GOODFILEproperly.

        You thanked me for taking the time to look at your code.
        I will now thank you to take the time to actually try the answer I gave you.

        Click on the spoiler link below to see what the problem is.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1192744]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2024-04-18 00:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found