Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^3: correspondence between two arrays

by aaron_baugher (Curate)
on Nov 24, 2011 at 20:23 UTC ( [id://939949]=note: print w/replies, xml ) Need Help??


in reply to Re^2: correspondence between two arrays
in thread correspondence between two arrays

I'd guess the downvotes (though I agree that may be a harsh response to a solution that could be much better but does work) are probably because he's looping through the entire second array for every item in the first array, instead of using a hash.

Writing the above got me curious: Just how bad is looping through an array and comparing as opposed to checking against a hash's keys? Very, very bad, it turns out. I wrote the script below, based on this task, to benchmark the two methods with a variable number of items in the lists. I only used one iteration for Benchmark (and removed the warnings to save space), because the test takes long enough to do once with larger lists, and I didn't want it to take all day. But the differences are large enough to ignore whatever margin of error that more iterations would smooth out. The hash method also has the overhead of splitting the elements of the second array and creating the hash.

With only 1000 items in each list, the array/array method takes about 2.75 seconds -- fast enough to live with, if you're not running it over and over all day. But the hash method is so fast that Benchmark seems unable to display the time elapsed sensibly. Bumping the lists up to 5000 items each, the array/array method goes up to almost 10 seconds, but the hash method is still down at .01 seconds, or almost 1000 times faster. At 10,000 items, array/array is up to 45 secs, and the hash is still at .02, or 1500 times faster. So not only is the array/array method much slower, but it scales much worse, since the number of loops and comparisons increases by the square of the list size. At 100,000 items in each array, the array/array method took well over an hour, and the hash method is still under a second! I knew the hash would be faster, but wow.

$ perl 939870.pl 1000 Building arrays of 1000 with unique 4-char keys... done. Arrays have 1000 items each. Benchmarking... Rate arrays plus regex using a hash arrays plus regex 2.70/s -- -100% using a hash 1000000000000000/s 37000000000000000% -- + $ perl 939870.pl 5000 Building arrays of 5000 with unique 4-char keys... done. Arrays have 5000 items each. Benchmarking... s/iter arrays plus regex using a hash arrays plus regex 9.40 -- -100% using a hash 1.00e-02 93900% -- $ perl 939870.pl 10000 Building arrays of 10000 with unique 4-char keys... done. Arrays have 10000 items each. Benchmarking... s/iter arrays plus regex using a hash arrays plus regex 46.3 -- -100% using a hash 3.00e-02 154267% -- $ perl 939870.pl 100000 Building arrays of 100000 with unique 4-char keys... done. Arrays have 100000 items each. Benchmarking... s/iter arrays plus regex using a hash arrays plus regex 4110 -- -100% using a hash 0.330 1245345% --
$ cat 939870.pl #!/usr/bin/perl use Modern::Perl; use Benchmark qw(:all); # create two lists, with the formats: # list1: (four characters A-Z) # RDAE # JFCS # # list2: (7 digit number, space, four chars A-Z) # 1234567 JIQZ # 2345678 EFOP my( @list1, @list2); my( %keys1, %keys2); my $array_size = $ARGV[0] || 100_000; print "Building arrays of $array_size with unique 4-char keys... "; for (1..$array_size){ my($s1, $s2) = ('',''); $s1 .= ('A'..'Z')[rand 26] for (1..4); $s2 .= ('A'..'Z')[rand 26] for (1..4); next if $keys1{$s1} or $keys2{$s2}; # make keys unique push @list1, $s1; # save keys in lists push @list2, int(rand(8999999)+1_000_000). " $s2"; } say "done.\nArrays have ". scalar(@list1). " items each. Benchmarking..."; cmpthese(1, { 'arrays plus regex' => \&array_plus_regex, 'using a hash' => \&using_hash, }); sub array_plus_regex { my @results; FOUND: for my $l1 (@list1){ for my $l2 (@list2){ if($l2 =~ / $l1/){ push @results, "$l1 ---> $`"; next FOUND; } } push @results, "$l1 ---> null"; } } sub using_hash { my @results; my %match; my $count = 0; for my $l2 (@list2){ my($k, $v) = split ' ', $l2; $match{$k} = $v; } for my $l1 (@list1){ if($match{$l1}){ push @results, "$l1 ---> $match{$l1}"; } else { push @results, "$l1 ---> null"; } } }

Aaron B.
My Woefully Neglected Blog, where I occasionally mention Perl.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://939949]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (2)
As of 2024-04-20 03:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found