Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: Loading 283600 records (Updated)

by remiah (Hermit)
on Sep 23, 2012 at 02:28 UTC ( #995154=note: print w/ replies, xml ) Need Help??


in reply to Re: Loading 283600 records (Updated)
in thread Loading 283600 records (WordNet)

Thanks for reply, BrowserUK.

I tried and below is the result.

          s/iter 02_split1 04_unpack 03_split2 01_substr
02_split1   6.34        --      -34%      -41%      -57%
04_unpack   4.17       52%        --      -11%      -35%
03_split2   3.71       71%       12%        --      -27%
01_substr   2.70      134%       54%       37%        --
And test code. I hope there is no silly mistakes.
#!/usr/bin/perl use strict; use warnings; use Time::HiRes; use Benchmark qw/cmpthese/; my $href; sub test1{ $href={}; open(my $fh, "<", "04.txt") or die $!; while(<$fh>){ chomp; push @{ $href->{ substr($_,0,10)} }, [ substr($_,10,10), subst +r($_,20)]; } close $fh; } sub test2{ my @rec; $href={}; open(my $fh, "<", "04.txt") or die $!; push @{ $href->{ $rec[0] } }, [ @rec[ 1, 2 ] ] while @rec = split '(?<=-[a-z])', <$fh>; close $fh; } sub test3{ #04-1.txt, with delimiter '|' my @rec; $href={}; open(my $fh, "<", "04-1.txt") or die $!; push @{ $href->{ $rec[0]} }, [ @rec[1, 2] ] while @rec = split /\|/, <$fh>; close $fh; } sub test4{ #with unpack my @rec; $href={}; open(my $fh, "<", "04.txt") or die $!; @rec = unpack( 'a10a10a4', $_ ), push @{ $href->{ $rec[0] } }, [ @rec[ 1, 2 ] ] while <$fh>; close $fh; } my %tests = ( '01_substr' => \&test1, '02_split1' => \&test2, '03_split2' => \&test3, '04_unpack' => \&test4, ); cmpthese( -20, #for 20 cpu secs \%tests );
I thought, seeing your unpack example, if there is a way like this ? This is impossible because unpack returns flat list, though...
open(my $fh, "<", "24length_packed.data" ) or die $!; local $/ = undef; map { push @{ $hash{ $_->[0] } }, [ $_->[1], $_->[2] ] } unpack( '(a10a10a4)*', <$fh>), close $fh;
With large loop, setting value to variable becomes some cost( this is BrowserUK taught me in this thread). So I think if I can avoid to use @rec, unpack and split becomes faster. Is there a good way?


Comment on Re^2: Loading 283600 records (Updated)
Select or Download Code
Re^3: Loading 283600 records (Updated)
by BrowserUk (Pope) on Sep 23, 2012 at 16:05 UTC

    There are no rules -- beyond minimising the number of opcodes called -- that apply in all situations. Try plugging this into your benchmark:

    my %hash; while( <> ) { my( $k, @v ) = unpack( 'a10a10a4', $_ ); push @{ $hash{ $k } }, \@v }

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      Added 3 test

      05 .. unpack, using List::MoreUtils's natatime
      06 .. unpack again
      07 .. yours
      
      Your unpack was faster than mines. This is benchmark results.
                         s/iter 02_split1 05_unpack_natatime 06_unpack_map 04_unpack 03_split2 07_unpack_2
      02_split1            6.38        --               -13%          -16%      -35%      -42%        -50%
      05_unpack_natatime   5.55       15%                 --           -4%      -25%      -33%        -43%
      06_unpack_map        5.34       19%                 4%            --      -22%      -31%        -40%
      04_unpack            4.18       53%                33%           28%        --      -11%        -24%
      03_split2            3.70       72%                50%           44%       13%        --        -14%
      07_unpack_2          3.18      100%                74%           68%       31%       16%          --
      01_substr            2.70      136%               105%           98%       55%       37%         18%
      
      And test code added. dsheroh told me of in memory SQLite. It's loading time is apparently faster than any of above tests. I will report it later.

      Hello, BrowserUK.

      I posted some benchmarks of sqlite in-memory at the bottom of this thread. It was surprise for me. Please have a look at.

      and thanks for responding to me.
      regards.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://995154]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2014-10-25 01:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (139 votes), past polls