Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Loading 283600 records (Updated)

by BrowserUk (Pope)
on Sep 22, 2012 at 13:30 UTC ( #995102=note: print w/replies, xml ) Need Help??


in reply to Loading 283600 records (WordNet)

Try:

my %hash; my @rec; push @{ $hash{ $rec[0] } }, [ $rec[ 1 ], $rec[ 2 ] ] while @rec = split '(?<=-[a-z])', <>;

Or 25% better still:

my %hash; my @rec; @rec = unpack( 'a10a10a4', $_ ), push @{ $hash{ $rec[0] } }, [ @rec[ 1, 2 ] ] while <>;

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

RIP Neil Armstrong

.

Replies are listed 'Best First'.
Re^2: Loading 283600 records (Updated)
by remiah (Hermit) on Sep 23, 2012 at 02:28 UTC

    Thanks for reply, BrowserUK.

    I tried and below is the result.

              s/iter 02_split1 04_unpack 03_split2 01_substr
    02_split1   6.34        --      -34%      -41%      -57%
    04_unpack   4.17       52%        --      -11%      -35%
    03_split2   3.71       71%       12%        --      -27%
    01_substr   2.70      134%       54%       37%        --
    
    And test code. I hope there is no silly mistakes. I thought, seeing your unpack example, if there is a way like this ? This is impossible because unpack returns flat list, though...
    open(my $fh, "<", "24length_packed.data" ) or die $!; local $/ = undef; map { push @{ $hash{ $_->[0] } }, [ $_->[1], $_->[2] ] } unpack( '(a10a10a4)*', <$fh>), close $fh;
    With large loop, setting value to variable becomes some cost( this is BrowserUK taught me in this thread). So I think if I can avoid to use @rec, unpack and split becomes faster. Is there a good way?

      There are no rules -- beyond minimising the number of opcodes called -- that apply in all situations. Try plugging this into your benchmark:

      my %hash; while( <> ) { my( $k, @v ) = unpack( 'a10a10a4', $_ ); push @{ $hash{ $k } }, \@v }

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

        Added 3 test

        05 .. unpack, using List::MoreUtils's natatime
        06 .. unpack again
        07 .. yours
        
        Your unpack was faster than mines. This is benchmark results.
                           s/iter 02_split1 05_unpack_natatime 06_unpack_map 04_unpack 03_split2 07_unpack_2
        02_split1            6.38        --               -13%          -16%      -35%      -42%        -50%
        05_unpack_natatime   5.55       15%                 --           -4%      -25%      -33%        -43%
        06_unpack_map        5.34       19%                 4%            --      -22%      -31%        -40%
        04_unpack            4.18       53%                33%           28%        --      -11%        -24%
        03_split2            3.70       72%                50%           44%       13%        --        -14%
        07_unpack_2          3.18      100%                74%           68%       31%       16%          --
        01_substr            2.70      136%               105%           98%       55%       37%         18%
        
        And test code added. dsheroh told me of in memory SQLite. It's loading time is apparently faster than any of above tests. I will report it later.

        Hello, BrowserUK.

        I posted some benchmarks of sqlite in-memory at the bottom of this thread. It was surprise for me. Please have a look at.

        and thanks for responding to me.
        regards.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://995102]
help
Chatterbox?
[ambrus]: (b) a good presentation system that lets the presenter quickly interactively edit the slides live during a presentation, to combine the advantages of blackboard and overhead slide styles in modern tech
[Corion]: Heh - in university, I cheated on (a) by doing blackboard presentations using chalk. But those were 2 hour presentations, not quick/essential/ reduced presentations where you want to show something quick
[ambrus]: (either on just one screen or two screens). this is necessary because
[ambrus]: overhead slide plus blackboard is inconvenient because the lighting conditions are different and they require separate areas you can't quickly repartition, and typing on keyboard is faster and more convenient than writing on a blackboard
[Corion]: (b) would be cool. I've thought about this doing Pod editing, and even simply regenerating/live updating the browser makes things much more interactive
[ambrus]: modern computers have way enough processing power to allow this, at least for geeks who are willing to spend a few weeks to learn a tricky new user interface like vim
[Corion]: ambrus: Well, for mathematical notation, I find blackboard much more convenient than a computer. But when inserting text or moving text around, the computer wins obviously
[ambrus]: But either of these is a big problem in practice, so I'd need to spend like thirty years of my life to solve (a) and five more years to solve (b)
[ambrus]: Corion: yes, CURRENTLY the blackboard is more convenient
[ambrus]: and it's not like I want to ban blackboards anyway

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (10)
As of 2017-09-26 10:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    During the recent solar eclipse, I:









    Results (293 votes). Check out past polls.

    Notices?