http://www.perlmonks.org?node_id=1031229


in reply to ignore duplicates and show unique values between 2 text files

Your issue appears to be that "'121'\n" and "'121'" are different strings. If you'd like to be newline insensitive (which would also address the extra newlines in your output), use chomp:
use strict; use warnings; my $f2 = 'cat_mapping_in_A.txt'; my $f1 = 'cat_mapping_in_B.txt'; my $outfile = '1.txt'; my %results = (); open FILE1, "$f1" or die "Could not open file: $! \n"; while(my $line = <FILE1>){ chomp $line; $results{$line}=1; } close(FILE1); open FILE2, "$f2" or die "Could not open file: $! \n"; while(my $line =<FILE2>) { chomp $line; $results{$line}++; } close(FILE2); open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \ +n"; foreach my $line (keys %results) { print OUTFILE "$line\n" if $results{$line} == 1; } close OUTFILE;

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Replies are listed 'Best First'.
Re^2: ignore duplicates and show unique values between 2 text files
by perlnoobster (Sexton) on Apr 29, 2013 at 15:42 UTC
    Hi kennethk, I am unsure on how to "reply to all" But can the script be modified to take account of two columns i.e

    FILE 1

    261293 'snow > equipment' 261293 'snow > equipment > boots' 261293 'snow > equipment > facemasks' 261293 'snow > equipment > goggles' 261293 'snow > equipment > helmets' 261293 'surf > accessories > books'

    FILE 2

    261293 'snow > equipment' 261293 'snow > equipment > boots' 261293 'snow > equipment > facemasks' 261293 'snow > equipment > goggles' 261293 'surf > accessories > books'

    OUTPUT

    261293    'snow > equipment > helmets'

    The two columns are separated by Tab, is this possible?

    Thank you
      This is Perl; just about everything is "possible". However, I fail to see why the two column example is functionally different than a full line comparison. "261293\t'snow > equipment > goggles'" will equal "261293\t'snow > equipment > goggles'" just as much as the two substrings would. Are you dealing with a case where the numbers change and you need to be insensitive to that?

      Breaking the two columns apart can easily be achieved with code like my @terms = split /\t/, $line;. See split.


      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        However, I fail to see why the two column example is functionally different than a full line comparison. "261293\t'snow > equipment > goggles'" will equal "261293\t'snow > equipment > goggles'" just as much as the two substrings would. Are you dealing with a case where the numbers change and you need to be insensitive to that?

        Yes, I totally agree, the same code should just work as well.

      > is this possible?

      yes, but we won't post whole code!

      Apply

      my ($number,$article) = split /\s+/, $line, 2

      for each input line and decide which part should be unique.

      learn to do it yourself with split.

      Cheers Rolf

      ( addicted to the Perl Programming Language)

      UPDATE

      added missing third parameter for split

        I think your posted code will not follow the posted spec. The posted lines contain additional whitespace, so my ($number,$article) = split /\s+/, $line will yield
        $number = 261293 $article = 'snow
        as opposed to split /\t/, $line, which would yield
        $number = 261293 $article = 'snow > equipment > helmets'
        Update: Parent code updated

        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re^2: ignore duplicates and show unique values between 2 text files
by perlnoobster (Sexton) on Apr 29, 2013 at 15:18 UTC
    Thank you kennethk , it works perfectly