http://www.perlmonks.org?node_id=501394

narashima has asked for the wisdom of the Perl Monks concerning the following question:

Revered monks,
I am trying to compare 2 simple arrays @file and @file2.

After the comparision I need:

  1. matched elments
  2. elements in @file1 not in @file2 and
  3. elements in @file2 not in @file1

I am using the following code... Is this the best way of doin it??? is there a optimization or even a direct function for this?

%seen = (); @seen{ @file1 } = (); @infile1only = grep { ! exists $seen{$_} } @file2; %seen =(); @seen{ @file2 } = (); @infile2only = grep { ! exists $seen{$_} } @file1; @matched = grep { exists $seen{$_} } @file1;

thanks,
narashima

Replies are listed 'Best First'.
Re: optimal way of comparing 2 arrays
by chester (Hermit) on Oct 19, 2005 at 19:57 UTC
    use warnings; use strict; use List::Compare; use Data::Dumper; my @file1 = qw{1 2 3 4 }; my @file2 = qw{ 2 3 4 5}; my $lc = List::Compare->new( { lists => [\@file1, \@file2], } ); my @intersection = $lc->get_intersection(); my @file1_only = $lc->get_Lonly(); my @file2_only = $lc->get_Ronly(); print Dumper \@intersection, \@file1_only, \@file2_only;
Re: optimal way of comparing 2 arrays
by Roy Johnson (Monsignor) on Oct 19, 2005 at 19:48 UTC
Re: optimal way of comparing 2 arrays
by dragonchild (Archbishop) on Oct 19, 2005 at 19:49 UTC
    You're on the right track, but not quite. You're looking over @file2 once and @file1 twice. You can combine the loops for @file1 into a for-loop and you'll be slightly better off.

    Of course, "best" is a difficult word to work with. Personally, I'd use Set and go for the intersection for both and the differences for the ones missing.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: optimal way of comparing 2 arrays
by NetWallah (Canon) on Oct 19, 2005 at 21:01 UTC
    Here is a way that will work, without using modules.
    This design is limited to comparing 31 files (assuming a 32 bit machine). Yes - I could extend it to 32 files at the expense of even more obfu.
    my @file1 = qw{1 2 3 4 }; my @file2 = qw{ 2 3 4 5}; my %seen; my @fa =([1<<1,\@file1,'File-1 only'],[1<<2,\@file2,'File-2 only'], [1<<1|1<<2,[],'Both have']); for (@fa){ my ($v,$f)=@$_; $seen{$_} |= $v for @$f }; ## print Dumper \%seen; for (@fa){ my ($v,$f,$n)=@$_; $v==$seen{$_} and print qq[$n :$_\n] for sort keys %seen }
    Prints:
    File-1 only :1 File-2 only :5 Both have :2 Both have :3 Both have :4
    Update: Updated code to also print "Both" lines.

         "Man cannot live by bread alone...
             He'd better have some goat cheese and wine to go with it!"

Re: optimal way of comparing 2 arrays
by jfroebe (Parson) on Oct 19, 2005 at 20:58 UTC

    Hi Narashima,

    I just saw your post... the best way is probably to benchmark the comparisons with your data set. I've written up a short script performing two of the comparisons that you list. You should be able to extend it very easily to test all the different variations of comparing two lists.

    My tests show that using two grep's is significantly faster than two foreach's. This may or may not be consistant with your real data. I would bet a pretty penny but not my life on it ;-) Maybe it will help you.

    Jason L. Froebe

    Team Sybase member

    No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1

      The grep are sure working out to be faster in my case atleast...
Re: optimal way of comparing 2 arrays
by blazar (Canon) on Oct 20, 2005 at 10:13 UTC
    Oh, I don't know about optimization but I would probably do something like
    #!/usr/bin/perl -l use strict; use warnings; my @file1=qw/foo bar baz/; my @file2=qw/fred bar barney/; my %saw; $saw{$_}=1 for @file1; $saw{$_}.=2 for @file2; my %comm; push @{ $comm{$saw{$_}} }, $_ for keys %saw; print <<"EOT" \@file1 only: (@{ $comm{1} }) \@file2 only: (@{ $comm{2} }) common: (@{ $comm{12} }) EOT __END__
    But you didn't tell us whether order does matter and if your arrays contain duplicates and if you also want them in the output and so on, so YMMV...
      Hi Blazar,
      No duplicates in these arrays. All values are unique.
      thanks
      narashima
        Then the solution above may be simplified to:
        #!/usr/bin/perl -l use strict; use warnings; my @file1=qw/foo bar baz/; my @file2=qw/fred bar barney/; my %saw; $saw{$_}++ for @file1, @file2; print <<"EOT" \@file1 only: (@{[ grep $saw{$_}==1, @file1 ]}) \@file2 only: (@{[ grep $saw{$_}==1, @file2 ]}) common: (@{[ grep $saw{$_}==2, @file2 ]}) EOT __END__
        or even, using modulo-2 math, but more obscurely:
        #!/usr/bin/perl -l use strict; use warnings; my @file1=qw/foo bar baz/; my @file2=qw/fred bar barney/; my %saw; $saw{$_} ^= 1 for @file1, @file2; print <<"EOT" \@file1 only: (@{[ grep $saw{$_}, @file1 ]}) \@file2 only: (@{[ grep $saw{$_}, @file2 ]}) common: (@{[ grep !$saw{$_}, @file2 ]}) EOT __END__
Re: optimal way of comparing 2 arrays
by graff (Chancellor) on Oct 20, 2005 at 01:34 UTC
    This sort of task comes up a lot -- the last time I posted a basic solution was here: Re^2: RFC: The Uniqueness of hashes.. A benchmark might not rank it as fastest (I don't know), but it's easy to code and understand.