Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^5: More efficient way to get uniq list elements from list of lists

by Aighearach (Initiate)
on Nov 11, 2004 at 23:01 UTC ( [id://407213]=note: print w/replies, xml ) Need Help??


in reply to Re^4: More efficient way to get uniq list elements from list of lists
in thread More efficient way to get uniq list elements from list of lists

Maybe it is just semantics, but I don't think map is the same algorithm as for(;;){} at the language level. At a higher level, sure, but here we're only comparing Perl implementations, so I'm assuming "algorithm" at that level.

Really I should have used a really big list for longrand, like a million elements or something. Because that was my intention, to show the scalability difference between correctly iterating over the arguments, whereas my results only hinted that there are scalability differences. Here they are with yours added and just a large data set, 1_000_000 this time. Note that we're takling about simonm() that was what you had posted, not this new simonm2() which is more of an improvement, and definately the best of the bunch.

Here are my results with a 1_000_000x2 data set.

Rate ewijaya_l simonm_l pg_l scooterm_l aighearach_l +aighearach2_l simonm2_l ewijaya_l 13736/s -- -48% -69% -74% -76% + -78% -80% simonm_l 26316/s 92% -- -41% -51% -54% + -57% -62% pg_l 44248/s 222% 68% -- -17% -23% + -28% -36% scooterm_l 53191/s 287% 102% 20% -- -7% + -14% -23% aighearach_l 57471/s 318% 118% 30% 8% -- + -7% -17% aighearach2_l 61728/s 349% 135% 40% 16% 7% + -- -11% simonm2_l 69444/s 406% 164% 57% 31% 21% + 13% --

THe aighearach2 is my try, but with your map to populate the slice, as follows:

sub aighearach2 { my ( %unique ); for ( my $i = 0; $i < @_; $i++ ) { @unique{ map @$_, @_ } = (); } return keys %unique; }

I find it interesting how the implementations start to really seperate from each other on larger data sets. I imagine that if they were run through Devel::DProf or something, it would be found that memory consumption is the big difference. If anybody is still following this thread, that would be interesting to see...

Your results are a bit different than mine, perhaps because of the different platform.

I guess it is the increment in aighearach2() that makes simonm2() 13%(!) faster

An important lesson can be learned, I think, by studying closely the changes you made between simonm() and simonm2(). A good contrast between an anon hash that is used simplistically, and a named hash with it's full power unleashed.


--
Snazzy tagline here

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://407213]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (6)
As of 2024-03-28 20:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found