Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^6: please help me to resolve the Line comments and appending issue

by aaron_baugher (Curate)
on Aug 10, 2012 at 15:46 UTC ( [id://986766]=note: print w/replies, xml ) Need Help??


in reply to Re^5: please help me to resolve the Line comments and appending issue
in thread please help me to resolve the Line comments and appending issue

Almost always true in practice. But there is more overhead in a hash than in an array, and that's magnified if you already have an array and have to duplicate it as a hash. If you're going to check more than one value against the list, as in the FAQ "How do I search file2 for all the values in file1?" then a hash is almost certainly the best solution. But if you only have to check one or two values, you may be better off sticking with grep:

% cat 986735.pl #!/usr/bin/env perl use Modern::Perl; use Benchmark qw(:all); my @words = split /\s+/, `cat bigfile`; #8.5MB file say scalar @words, " words in bigfile"; #1.3M words my $match = 'professional'; # appears 16 times scattered through bigfi +le cmpthese( 10, { 'hash it' => \&hashit, 'grep it' => \&grepit, 'first it' => \&firstit, }); sub hashit { my %h; @h{@words} = (); my $exists = exists $h{$match}; } sub grepit { my $exists = grep { $_ eq $match } @words; } sub firstit { use List::Util 'first'; my $exists = first { $_ eq $match } @words; } % perl 986735.pl 1293687 words in bigfile Rate hash it grep it first it hash it 3.33/s -- -43% -89% grep it 5.80/s 74% -- -81% first it 29.8/s 795% 413% --

So if I only need to search the list once, grep wins over a hash. For multiple searches, a hash comes out ahead. List::Util's first() routine splits the difference a little; with my dataset it beats the hash for up to about 8 searches, but I assume that would vary greatly depending on how early in the array the match is made. It'd take more testing with matches found earlier/later/never in the array to form a good comparison there.

My simple conclusion would be: always build a hash unless you know your program will only search it once; then use grep or List::Util::first. Also, if I'm pulling data in from somewhere (like a file) and I know I'm going to be searching it this way, I put it straight into a hash from the start.

Aaron B.
Available for small or large Perl jobs; see my home node.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://986766]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2024-04-23 08:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found