http://www.perlmonks.org?node_id=995102


in reply to Loading 283600 records (WordNet)

Try:

my %hash; my @rec; push @{ $hash{ $rec[0] } }, [ $rec[ 1 ], $rec[ 2 ] ] while @rec = split '(?<=-[a-z])', <>;

Or 25% better still:

my %hash; my @rec; @rec = unpack( 'a10a10a4', $_ ), push @{ $hash{ $rec[0] } }, [ @rec[ 1, 2 ] ] while <>;

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

RIP Neil Armstrong

.

Replies are listed 'Best First'.
Re^2: Loading 283600 records (Updated)
by remiah (Hermit) on Sep 23, 2012 at 02:28 UTC

    Thanks for reply, BrowserUK.

    I tried and below is the result.

              s/iter 02_split1 04_unpack 03_split2 01_substr
    02_split1   6.34        --      -34%      -41%      -57%
    04_unpack   4.17       52%        --      -11%      -35%
    03_split2   3.71       71%       12%        --      -27%
    01_substr   2.70      134%       54%       37%        --
    
    And test code. I hope there is no silly mistakes. I thought, seeing your unpack example, if there is a way like this ? This is impossible because unpack returns flat list, though...
    open(my $fh, "<", "24length_packed.data" ) or die $!; local $/ = undef; map { push @{ $hash{ $_->[0] } }, [ $_->[1], $_->[2] ] } unpack( '(a10a10a4)*', <$fh>), close $fh;
    With large loop, setting value to variable becomes some cost( this is BrowserUK taught me in this thread). So I think if I can avoid to use @rec, unpack and split becomes faster. Is there a good way?

      There are no rules -- beyond minimising the number of opcodes called -- that apply in all situations. Try plugging this into your benchmark:

      my %hash; while( <> ) { my( $k, @v ) = unpack( 'a10a10a4', $_ ); push @{ $hash{ $k } }, \@v }

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

        Added 3 test

        05 .. unpack, using List::MoreUtils's natatime
        06 .. unpack again
        07 .. yours
        
        Your unpack was faster than mines. This is benchmark results.
                           s/iter 02_split1 05_unpack_natatime 06_unpack_map 04_unpack 03_split2 07_unpack_2
        02_split1            6.38        --               -13%          -16%      -35%      -42%        -50%
        05_unpack_natatime   5.55       15%                 --           -4%      -25%      -33%        -43%
        06_unpack_map        5.34       19%                 4%            --      -22%      -31%        -40%
        04_unpack            4.18       53%                33%           28%        --      -11%        -24%
        03_split2            3.70       72%                50%           44%       13%        --        -14%
        07_unpack_2          3.18      100%                74%           68%       31%       16%          --
        01_substr            2.70      136%               105%           98%       55%       37%         18%
        
        And test code added. dsheroh told me of in memory SQLite. It's loading time is apparently faster than any of above tests. I will report it later.

        Hello, BrowserUK.

        I posted some benchmarks of sqlite in-memory at the bottom of this thread. It was surprise for me. Please have a look at.

        and thanks for responding to me.
        regards.