Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
XP is just a number
 
PerlMonks  

Re^2: Pulling out data from one file thats not in another

by Angharad (Pilgrim)
on Apr 27, 2010 at 14:59 UTC ( #837122=note: print w/ replies, xml ) Need Help??


in reply to Re: Pulling out data from one file thats not in another
in thread Pulling out data from one file thats not in another

I tried 'diff' -the trouble there is that the items in the two files don't always appear in the same order. It simply doesn't work. I'll take a peak at the links you suggested though.


Comment on Re^2: Pulling out data from one file thats not in another
Re^3: Pulling out data from one file thats not in another
by kennethk (Monsignor) on Apr 27, 2010 at 15:14 UTC
    By using a hash as per the FAQ, the intersection/difference calculation will be order-independent. You will have to compare the resulting hash (called %count in the FAQ) against a given file's content to determine which file lacked the line in question. Note that the FAQ's code fails if either array has repeat entries.

    Alternatively, you can use bit operations rather than simple incrementation to encode a little extra info. The FAQ code structure is more immediately obvious, but this may do more of what you want:

    #!/usr/bin/perl use strict; use warnings; my $master = shift; my $completed = shift; open my $mh, '<', $master or die "Open fail on $master: $!"; my @master_lines = <$mh>; chomp @master_lines; open my $ch, '<', $completed or die "Open fail on $completed: $!"; my @completed_lines = <$ch>; chomp @completed_lines; my %count; for my $element (@master_lines) { $count{$element}|=1; } for my $element (@completed_lines) { $count{$element}|=2; } print "$master only:\n"; for my $element (@master_lines) { next if $count{$element} & 2; print "$element\n"; } print "$completed only:\n"; for my $element (@completed_lines) { next if $count{$element} & 1; print "$element\n"; }
Re^3: Pulling out data from one file thats not in another
by rubasov (Friar) on Apr 27, 2010 at 15:32 UTC

    There are already several tools to achieve what you want, writing your own is probably needless.

    A standard Unix-like solution (works under bash):
    $ diff <( sort master ) <( sort completed ) | grep '^<' | cut -d ' ' - +f2-

    Depending on your needs you may want to use sort -u instead of a simple sort.

    Or if you're under some Debian-derivative distro just install the moreutils package and use combine:

    $ combine master not completed 1ao8A 1jkxA 1juvA 1mejA 1meoA 1n0uA 1pjqA

    Hope that helps.

      I often use comm instead of diff, too.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://837122]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (5)
As of 2014-04-19 00:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (473 votes), past polls