Re: optimal way of comparing 2 arrays
by chester (Hermit) on Oct 19, 2005 at 19:57 UTC
|
use warnings;
use strict;
use List::Compare;
use Data::Dumper;
my @file1 = qw{1 2 3 4 };
my @file2 = qw{ 2 3 4 5};
my $lc = List::Compare->new( {
lists => [\@file1, \@file2],
} );
my @intersection = $lc->get_intersection();
my @file1_only = $lc->get_Lonly();
my @file2_only = $lc->get_Ronly();
print Dumper \@intersection, \@file1_only, \@file2_only;
| [reply] [Watch: Dir/Any] [d/l] |
Re: optimal way of comparing 2 arrays
by Roy Johnson (Monsignor) on Oct 19, 2005 at 19:48 UTC
|
| [reply] [Watch: Dir/Any] |
Re: optimal way of comparing 2 arrays
by dragonchild (Archbishop) on Oct 19, 2005 at 19:49 UTC
|
You're on the right track, but not quite. You're looking over @file2 once and @file1 twice. You can combine the loops for @file1 into a for-loop and you'll be slightly better off.
Of course, "best" is a difficult word to work with. Personally, I'd use Set and go for the intersection for both and the differences for the ones missing.
My criteria for good software:
- Does it work?
- Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
| [reply] [Watch: Dir/Any] |
Re: optimal way of comparing 2 arrays
by NetWallah (Canon) on Oct 19, 2005 at 21:01 UTC
|
Here is a way that will work, without using modules.
This design is limited to comparing 31 files (assuming a 32 bit machine). Yes - I could extend it to 32 files at the expense of even more obfu.
my @file1 = qw{1 2 3 4 };
my @file2 = qw{ 2 3 4 5};
my %seen;
my @fa =([1<<1,\@file1,'File-1 only'],[1<<2,\@file2,'File-2 only'],
[1<<1|1<<2,[],'Both have']);
for (@fa){
my ($v,$f)=@$_;
$seen{$_} |= $v for @$f
};
## print Dumper \%seen;
for (@fa){
my ($v,$f,$n)=@$_;
$v==$seen{$_} and print qq[$n :$_\n]
for sort keys %seen
}
Prints:
File-1 only :1
File-2 only :5
Both have :2
Both have :3
Both have :4
Update: Updated code to also print "Both" lines.
"Man cannot live by bread alone...
He'd better have some goat cheese and wine to go with it!"
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: optimal way of comparing 2 arrays
by jfroebe (Parson) on Oct 19, 2005 at 20:58 UTC
|
Hi Narashima,
I just saw your post... the best way is probably to benchmark the comparisons with your data set. I've written up a short script performing two of the comparisons that you list. You should be able to extend it very easily to test all the different variations of comparing two lists.
My tests show that using two grep's is significantly faster than two foreach's. This may or may not be consistant with your real data. I would bet a pretty penny but not my life on it ;-) Maybe it will help you.
Jason L. Froebe
Team Sybase member No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1
| [reply] [Watch: Dir/Any] |
|
The grep are sure working out to be faster in my case atleast...
| [reply] [Watch: Dir/Any] |
Re: optimal way of comparing 2 arrays
by blazar (Canon) on Oct 20, 2005 at 10:13 UTC
|
Oh, I don't know about optimization but I would probably do something like
#!/usr/bin/perl -l
use strict;
use warnings;
my @file1=qw/foo bar baz/;
my @file2=qw/fred bar barney/;
my %saw;
$saw{$_}=1 for @file1;
$saw{$_}.=2 for @file2;
my %comm; push @{ $comm{$saw{$_}} }, $_ for keys %saw;
print <<"EOT"
\@file1 only: (@{ $comm{1} })
\@file2 only: (@{ $comm{2} })
common: (@{ $comm{12} })
EOT
__END__
But you didn't tell us whether order does matter and if your arrays contain duplicates and if you also want them in the output and so on, so YMMV... | [reply] [Watch: Dir/Any] [d/l] |
|
Hi Blazar,
No duplicates in these arrays. All values are unique.
thanks
narashima
| [reply] [Watch: Dir/Any] |
|
Then the solution above may be simplified to:
#!/usr/bin/perl -l
use strict;
use warnings;
my @file1=qw/foo bar baz/;
my @file2=qw/fred bar barney/;
my %saw;
$saw{$_}++ for @file1, @file2;
print <<"EOT"
\@file1 only: (@{[ grep $saw{$_}==1, @file1 ]})
\@file2 only: (@{[ grep $saw{$_}==1, @file2 ]})
common: (@{[ grep $saw{$_}==2, @file2 ]})
EOT
__END__
or even, using modulo-2 math, but more obscurely:
#!/usr/bin/perl -l
use strict;
use warnings;
my @file1=qw/foo bar baz/;
my @file2=qw/fred bar barney/;
my %saw;
$saw{$_} ^= 1 for @file1, @file2;
print <<"EOT"
\@file1 only: (@{[ grep $saw{$_}, @file1 ]})
\@file2 only: (@{[ grep $saw{$_}, @file2 ]})
common: (@{[ grep !$saw{$_}, @file2 ]})
EOT
__END__
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: optimal way of comparing 2 arrays
by graff (Chancellor) on Oct 20, 2005 at 01:34 UTC
|
This sort of task comes up a lot -- the last time I posted a basic solution was here: Re^2: RFC: The Uniqueness of hashes.. A benchmark might not rank it as fastest (I don't know), but it's easy to code and understand. | [reply] [Watch: Dir/Any] |