Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

ignore duplicates and show unique values between 2 text files

by perlnoobster (Sexton)
on Apr 29, 2013 at 14:56 UTC ( #1031226=perlquestion: print w/ replies, xml ) Need Help??
perlnoobster has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I really hope someone can help me with my issue, I have two text files that I am attempting to compare they are in this structure: FILE 1
'abc' 'def' '121' 'xyx'
FILE 2
'def' '121'
File 3 (output file ideally) 'abc' 'xyz' however it is showing:
'121' 'abc' '121' 'xyx'
here is the code I've compiled so far, however the results are not what i desire
use strict; use warnings; my $f2 = 'cat_mapping_in_A.txt'; my $f1 = 'cat_mapping_in_B.txt'; my $outfile = '1.txt'; my %results = (); open FILE1, "$f1" or die "Could not open file: $! \n"; while(my $line = <FILE1>){ $results{$line}=1; } close(FILE1); open FILE2, "$f2" or die "Could not open file: $! \n"; while(my $line =<FILE2>) { $results{$line}++; } close(FILE2); open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \ +n"; foreach my $line (keys %results) { print OUTFILE "$line\n" if $results{$line} == 1; } close OUTFILE;
Please can someone help me? I really don't know how to fix this Thank you

Comment on ignore duplicates and show unique values between 2 text files
Select or Download Code
Re: ignore duplicates and show unique values between 2 text files
by choroba (Abbot) on Apr 29, 2013 at 15:04 UTC
    The reason of your problem is the newline character. It is considered a part of the line you read in, but it is not present on the last line in each file. Therefore, '121' plus newline is not the same as '121' without a newline.

    Use chomp to get rid of newlines:

    while (my $line = <FILE>) { chomp $line; # ...
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Hi Choroba, I have updated the script however my output still does not show the desired results, it show:
      'def' 'xyx' 'def' 'abc'
        You should use chomp for the second file handle, too.
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: ignore duplicates and show unique values between 2 text files
by kennethk (Monsignor) on Apr 29, 2013 at 15:05 UTC
    Your issue appears to be that "'121'\n" and "'121'" are different strings. If you'd like to be newline insensitive (which would also address the extra newlines in your output), use chomp:
    use strict; use warnings; my $f2 = 'cat_mapping_in_A.txt'; my $f1 = 'cat_mapping_in_B.txt'; my $outfile = '1.txt'; my %results = (); open FILE1, "$f1" or die "Could not open file: $! \n"; while(my $line = <FILE1>){ chomp $line; $results{$line}=1; } close(FILE1); open FILE2, "$f2" or die "Could not open file: $! \n"; while(my $line =<FILE2>) { chomp $line; $results{$line}++; } close(FILE2); open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \ +n"; foreach my $line (keys %results) { print OUTFILE "$line\n" if $results{$line} == 1; } close OUTFILE;

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Thank you kennethk , it works perfectly
      Hi kennethk, I am unsure on how to "reply to all" But can the script be modified to take account of two columns i.e

      FILE 1

      261293 'snow > equipment' 261293 'snow > equipment > boots' 261293 'snow > equipment > facemasks' 261293 'snow > equipment > goggles' 261293 'snow > equipment > helmets' 261293 'surf > accessories > books'

      FILE 2

      261293 'snow > equipment' 261293 'snow > equipment > boots' 261293 'snow > equipment > facemasks' 261293 'snow > equipment > goggles' 261293 'surf > accessories > books'

      OUTPUT

      261293    'snow > equipment > helmets'

      The two columns are separated by Tab, is this possible?

      Thank you
        > is this possible?

        yes, but we won't post whole code!

        Apply

        my ($number,$article) = split /\s+/, $line, 2

        for each input line and decide which part should be unique.

        learn to do it yourself with split.

        Cheers Rolf

        ( addicted to the Perl Programming Language)

        UPDATE

        added missing third parameter for split

        This is Perl; just about everything is "possible". However, I fail to see why the two column example is functionally different than a full line comparison. "261293\t'snow > equipment > goggles'" will equal "261293\t'snow > equipment > goggles'" just as much as the two substrings would. Are you dealing with a case where the numbers change and you need to be insensitive to that?

        Breaking the two columns apart can easily be achieved with code like my @terms = split /\t/, $line;. See split.


        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: ignore duplicates and show unique values between 2 text files (real data)
by LanX (Canon) on Apr 29, 2013 at 15:12 UTC
    I've tested your code with the data you provided and it works 100%!

    Maybe you should chomp and trim your data to avoid problems with "invisible" whitespaces?

    However the output you are showing doesn't fit to the input you posted, since

    '121' 'abc'

    are never paired in following lines.

    Please show the real data next time, at least out of courtesy to the people spending time to help you!

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      Sorry Rolf, I apologize for the mistake it won't happen again
Re: ignore duplicates and show unique values between 2 text files
by Khen1950fx (Canon) on Apr 29, 2013 at 17:06 UTC
    An easier way is to use List::Compare:
    #!/usr/bin/perl -l use strict; use warnings; use List::Compare; my (@Llist) = qw(abc def 121 xyz); my (@Rlist) = qw(def 121); my $lc = List::Compare->new( \@Llist, \@Rlist ); my (@sdiff) = $lc->get_symmetric_difference; foreach my $sdiff(@sdiff) { print $sdiff; }
Re: ignore duplicates and show unique values between 2 text files
by hdb (Parson) on Apr 29, 2013 at 19:06 UTC

    If you have TWO files, then you would do "++" on the first, and "--" on the second...

    use strict; use warnings; my $file1 = <<FILE1; 261293 'snow > equipment' 261293 'snow > equipment > boots' 261293 'snow > equipment > facemasks' 261293 'snow > equipment > goggles' 261293 'snow > equipment > helmets' 261293 'surf > accessories > books' FILE1 my $file2 = <<FILE2; 261293 'snow > equipment' 261293 'snow > equipment > boots' 261293 'snow > equipment > facemasks' 261293 'snow > equipment > goggles' 261293 'surf > accessories > books' FILE2 my %uniq; $uniq{$_}++ for split /\n/, $file1; $uniq{$_}-- for split /\n/, $file2; print join "\n", grep { $uniq{$_} } keys %uniq; print "\n";
      > If you have TWO files, then you would do "++" on the first, and "--" on the second...

      this only works if lines are already unique within their files.

      i.a.w. 2-1 is true but means the line appeared in both files...

      Cheers Rolf

      ( addicted to the Perl Programming Language)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1031226]
Approved by Corion
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2014-08-21 00:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (126 votes), past polls