Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Matching arrays

by arrow (Friar)
on Mar 22, 2003 at 12:28 UTC ( [id://245124]=perlquestion: print w/replies, xml ) Need Help??

arrow has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way I can match all of the items in an array with the items in another array, (i.e. success if 1,2,3 found in 1,2,3,4,5, but fail when 4,5,6 not found in 1,2,3,4,5). I've tried foreach but it takes too long. Please Help! Just Another Perl Wannabe

Replies are listed 'Best First'.
Re: Matching arrays
by robartes (Priest) on Mar 22, 2003 at 12:55 UTC
    I'd map your second array to a hash and then loop over the elements of the first hash array, bombing out when the key in question does not exist:
    use strict; my @subarray=qw(1 2 3); my @array1=qw(1 2 3 4 5 6); my @array2=qw(2 3 4 5 6); print contains(\@subarray, \@array1)?"Yes":"No"; print "\n"; print contains(\@subarray, \@array2)?"Yes":"No"; sub contains { my $array=shift; my $container=shift; my %container=map { $_ => undef } @$container; foreach (@$array) { return 0 unless (exists $container{$_}); } return 1; } __END__ Yes No
    Note that there are some interesting points on array manipulations like this in perlfaq4.

    CU
    Robartes-

Re: Matching arrays
by gmax (Abbot) on Mar 22, 2003 at 13:40 UTC

    Using a join is much faster.

    update (1). Provided that the arrays are both sorted and contiguous. Thanks robartes for pointing it out.

    update (2). A modified routine, also using join solves both the above problems. (see below)

    #!/usr/bin/perl -w use strict; use Benchmark qw(cmpthese timethese); my $num_elem = 10_000; my $count = 500; my @elements = map {$_} (1..$num_elem), (qw(10001 10002 10003 10004 10 +005)); my @search = qw( 10001 10002 10003 ); # ------------------------------------------ sub match_arrays{ # gmax my ($elem, $find) = @_; my $str_find = join "/", @$find; return join('/',@$elem) =~ /$str_find/; } #------------------------------------------- #print match_arrays(\@elements, \@search),"\n"; sub contains { # robartes my $array=shift; my $container=shift; my %container=map { $_ => undef } @$container; foreach (@$array) { return 0 unless (exists $container{$_}); } return 1; } sub in_ordered_set { # demerphq my ($s,$a)=@_; return 1 unless @$s; return 0 unless @$a; my ($i,$j)=(0,0); while ($i<@$s) { while ($j<@$a) { last if $a->[$j] == $s->[$i]; return 0 if $a->[$j]>$s->[$i]; $j++; } $j++; $i++; } return 1 } my $result = timethese($count, { 'join' => sub {match_arrays(\@elements, \@search)}, 'hash' => sub {contains (\@search, \@elements)}, 'set' => sub {in_ordered_set (\@search, \@elements)} }); cmpthese ($result); __END__ Array elements: 1000 Benchmark: timing 2000 iterations of hash, join, set... hash: 4 wallclock secs ( 4.34 usr + 0.02 sys = 4.36 CPU) join: 0 wallclock secs ( 0.47 usr + 0.00 sys = 0.47 CPU) set: 4 wallclock secs ( 3.64 usr + 0.00 sys = 3.64 CPU) Rate hash set join hash 459/s -- -17% -89% set 549/s 20% -- -87% join 4255/s 828% 674% -- Array elements: 10000 Benchmark: timing 500 iterations of hash, join, set... hash: 15 wallclock secs (14.72 usr + 0.06 sys = 14.78 CPU) join: 1 wallclock secs ( 1.26 usr + 0.01 sys = 1.27 CPU) set: 10 wallclock secs ( 9.04 usr + 0.02 sys = 9.06 CPU) Rate hash set join hash 33.8/s -- -39% -91% set 55.2/s 63% -- -86% join 394/s 1064% 613% --

    update (2). This modified sub can also find a match when the arrays are not sorted, with almost the same efficiency.

    sub match_arrays2{ # gmax my ($elem, $find) = @_; my @regex_find = map {qr /#$_#/} @$find; my @match = (); my $elem_join = '#' .join('#',@$elem) . "#"; for (@regex_find) { push @match, undef if $elem_join =~ /$_/ ; } return @match == @$find; } __END__ Benchmark: timing 500 iterations of hash, join, join2, set... hash: 15 wallclock secs (14.66 usr + 0.06 sys = 14.72 CPU) join: 1 wallclock secs ( 1.27 usr + 0.00 sys = 1.27 CPU) join2: 2 wallclock secs ( 1.27 usr + 0.00 sys = 1.27 CPU) set: 10 wallclock secs ( 9.89 usr + 0.04 sys = 9.93 CPU) Rate hash set join join2 hash 34.0/s -- -33% -91% -91% set 50.4/s 48% -- -87% -87% join 394/s 1059% 682% -- 0% join2 394/s 1059% 682% 0% --
    _ _ _ _ (_|| | |(_|>< _|
Re: Matching arrays
by demerphq (Chancellor) on Mar 22, 2003 at 13:13 UTC

    Assuming they are both sorted lists this will find if @$s is in @$a

    sub in_ordered_set { my ($s,$a)=@_; return 1 unless @$s; return 0 unless @$a; my ($i,$j)=(0,0); while ($i<@$s) { while ($j<@$a) { last if $a->[$j]==$s->[$i]; return 0 if $a->[$j]>$s->[$i]; $j++; } return 0 if $j>=@$a; # Updated $j++; $i++; } return 1 }

    But generally you will want to do this lookup many times on @$a so in that case you should build a hash.


    ---
    demerphq

    <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...

    • Update:  
    Fixed bug in code: would return true trying to find 3,4 in 1,2,3.


Re: Matching arrays
by Limbic~Region (Chancellor) on Mar 22, 2003 at 15:56 UTC
    arrow,
    I think you left out some important information. As BrowswerUK pointed out - are you looking for a match if the sequence doesn't match, but all the elements are present? Is a match acceptable if the same element is repeated in one array, but is only present once in the other? If you have a requirement to match duplicates than converting to a hash without a count is not going to work.

    #!/usr/bin/perl -w use strict; my @array1 = (1,2,2,4); my @array2 = (2,2,4,1,6); my %hash1; my %hash2; my $not_ok; foreach(@array1){ $hash1{$_} +=1; } foreach(@array2){ $hash2{$_} +=1; } while (my ($key,$hash1_value) = each %hash1) { if ($hash2{$key} && ($hash1_value == $hash2{$key})) { next; } else { $not_ok = 1; last; } } if ($not_ok) { print "Array1 is not a subset of Array2\n"; } else { print "We have a winner\n"; }
    There is a penalty in efficiency for wanting this kind of accuracy - thanks to gmax for pointing this out and my solution using Data::Dumper was just plain wrong!
    Cheers - L~R

    Update: If sequence is important, then I would use the following logic:

  • Determine the smaller of the two arrays
  • Store the first element of the smaller array in a variable
  • Do a for loop on the larger array in the (0 .. $#array) form
  • Check each element for the first element of smaller array or next
  • Upon match, verify each subsequent element is a match
  • Match if you reach the last element of smaller array
  • No match if you don't

    There is also a module that can preserve the order of a tied hash through some unknown magic you could look at, but I don't know much about it.

Re: Matching arrays
by zby (Vicar) on Mar 22, 2003 at 13:20 UTC
    You can try Set::Array. The == operator is overloaded to do what you need I believe.
    Update: Ofcourse you need intersection and equality.
Re: Matching arrays
by BrowserUk (Patriarch) on Mar 22, 2003 at 13:14 UTC

    Would you consider it a match if array 1 contained (4,5,6) and array 2 contained (1, 2, 3, 6, 4, 5) ?


    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
Re: Matching arrays
by arrow (Friar) on Mar 24, 2003 at 15:37 UTC
    Yeah, sorry, I guess I did forget some things. Anyway, I want it so that the items could be in any order, so if I'm checking @array = (1,2,3,4) with @array2 = (2,5,4,6,3,1) it would still match because all of the items in @array are in @array2, no matter what order they are in. Hope this clarifies things a little, and thanks for all your earlier answers...

    Just Another Perl Wannabe
Re: Matching arrays
by Anonymous Monk on Mar 26, 2003 at 07:43 UTC
    You can match arrays by copying them in a string and then matching the strings as follows.
    my $Line1 = join(" ",@array1); my $Line2 = join(" ",@array2); if ($Line1 =~ /$Line2/) { $Match = 1; } else { $Match = 0; }

    Edit by tye, add CODE tags

      above code won't work if the elements in array are not in order. solution is here:
      my $Line1 = join(" ",@a1); my $Match = 0; for ($i=0;$i<=$#a2;$i++) { if ($Line1 =~ /$a2[$i]/) { $Match = 1; } else { $Match = 0; print " No match"; last; } } if ($Match == 1) { print " Match"; }

      Edit by tye, add CODE tags

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://245124]
Approved by robartes
Front-paged by gmax
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (2)
As of 2024-04-25 20:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found