Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

One liner to compare to lists and print the ID's in common

by ZWcarp (Beadle)
on Jun 15, 2011 at 22:47 UTC ( #909872=perlquestion: print w/replies, xml ) Need Help??
ZWcarp has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks

I was wondering if anyone knew a good one liner for comparing the values in two separate lists and printing a "venn diagram" so to speak, of what appears in both, and in one or the other exclusively.

For example, lets say file1.txt has 300 patient IDs and file2.txt has 400 patient IDs, both arranged in one column. Some of the patient IDs are shared between the two lists, while others are exclusive to one or the other. Is there a good perl -ne one liner to sort of quickly check to see which IDs occur in both and which do not. I know how to do this in a script but the column positions of the data im interested as well as its format change, so it would be easier to just do through bash somehow.

Thanks so much for your help, it is very appreciated.

  • Comment on One liner to compare to lists and print the ID's in common

Replies are listed 'Best First'.
Re: One liner to compare to lists and print the ID's in common
by BrowserUk (Pope) on Jun 15, 2011 at 23:07 UTC

    For a quick check this might do?

    perl -MData::Dump=pp -nle"push @{ $h{ $_ } }, $ARGV }{ pp\%h" 30.txt 4 +0.txt { X0001 => ["30.txt", "40.txt"], X0002 => ["30.txt", "40.txt"], X0003 => ["30.txt", "40.txt"], X0004 => ["30.txt", "40.txt"], X0006 => ["30.txt", "40.txt"], X0007 => ["40.txt"], X0008 => ["30.txt", "40.txt"], X0009 => ["30.txt"], X0010 => ["30.txt", "40.txt"], X0011 => ["30.txt", "40.txt"], X0012 => ["30.txt", "40.txt"], X0013 => ["30.txt", "40.txt"], X0014 => ["30.txt"], X0015 => ["30.txt", "40.txt"], X0016 => ["30.txt", "40.txt"], X0017 => ["40.txt"], X0018 => ["30.txt", "40.txt"], X0019 => ["30.txt", "40.txt"], X0020 => ["30.txt"], X0021 => ["40.txt"], X0022 => ["40.txt"], X0023 => ["30.txt", "40.txt"], X0024 => ["40.txt"], X0025 => ["30.txt", "40.txt"], X0026 => ["30.txt", "40.txt"], X0027 => ["30.txt", "40.txt"], X0028 => ["40.txt"], X0029 => ["30.txt", "40.txt"], X0030 => ["30.txt"], X0031 => ["40.txt"], X0032 => ["30.txt", "40.txt"], X0033 => ["40.txt"], X0034 => ["30.txt", "40.txt"], X0035 => ["30.txt"], X0038 => ["40.txt"], X0039 => ["30.txt"], X0040 => ["40.txt"], X0041 => ["40.txt"], X0043 => ["40.txt"], X0044 => ["40.txt"], X0045 => ["30.txt", "40.txt"], X0046 => ["40.txt"], X0047 => ["30.txt", "40.txt"], X0048 => ["40.txt"], X0049 => ["40.txt"], X0050 => ["30.txt", "40.txt"], }

    Perhaps an improvement:

    perl -nlE"push@{$h{$_}},$ARGV}{say qq[@{$h{$_}}:$_] for keys%h" 30.txt + 40.txt | sort 30.txt 40.txt:X0001 30.txt 40.txt:X0002 30.txt 40.txt:X0003 30.txt 40.txt:X0004 30.txt 40.txt:X0006 30.txt 40.txt:X0008 30.txt 40.txt:X0010 30.txt 40.txt:X0011 30.txt 40.txt:X0012 30.txt 40.txt:X0013 30.txt 40.txt:X0015 30.txt 40.txt:X0016 30.txt 40.txt:X0018 30.txt 40.txt:X0019 30.txt 40.txt:X0023 30.txt 40.txt:X0025 30.txt 40.txt:X0026 30.txt 40.txt:X0027 30.txt 40.txt:X0029 30.txt 40.txt:X0032 30.txt 40.txt:X0034 30.txt 40.txt:X0045 30.txt 40.txt:X0047 30.txt 40.txt:X0050 30.txt:X0009 30.txt:X0014 30.txt:X0020 30.txt:X0030 30.txt:X0035 30.txt:X0039 40.txt:X0007 40.txt:X0017 40.txt:X0021 40.txt:X0022 40.txt:X0024 40.txt:X0028 40.txt:X0031 40.txt:X0033 40.txt:X0038 40.txt:X0040 40.txt:X0041 40.txt:X0043 40.txt:X0044 40.txt:X0046 40.txt:X0048 40.txt:X0049

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: One liner to compare to lists and print the ID's in common
by Perlbotics (Canon) on Jun 15, 2011 at 23:27 UTC


    > cat a aa ab aaa aaaa > cat b ab bb bbb bbbb > perl -ne 'BEGIN{ $v=1 ; @A=@ARGV }; chomp; $c{$_}+=$v; $v=-1 if eof; + END { foreach (sort keys %c) { print $c{$_}>=0 ? "$A[0] " : "- "; pr +int $c{$_}<=0 ? "$A[1] " : "- "; print " : $_\n" }}' a b a - : aa a - : aaa a - : aaaa a b : ab - b : bb - b : bbb - b : bbbb
    I am sure, this can be golfed...

Re: One liner to compare to lists and print the ID's in common
by planetscape (Chancellor) on Jun 16, 2011 at 07:55 UTC
Re: One liner to compare to lists and print the ID's in common
by toolic (Bishop) on Jun 15, 2011 at 23:15 UTC
    comm (if your files are already sorted).

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://909872]
Approved by toolic
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2016-10-22 22:06 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (298 votes). Check out past polls.