Re: Loading 283600 records (Updated)

in reply to Loading 283600 records (WordNet)

Try:

my %hash;
my @rec;
push @{ $hash{ $rec[0] } }, [ $rec[ 1 ], $rec[ 2 ] ]
    while @rec = split '(?<=-[a-z])', <>;
[download]

Or 25% better still:

my %hash;
my @rec;

@rec = unpack( 'a10a10a4', $_ ),
    push @{ $hash{ $rec[0] } }, [ @rec[ 1, 2 ] ]
    while <>;
[download]

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

RIP Neil Armstrong

Comment on Re: Loading 283600 records (Updated) Select or Download Code

Replies are listed 'Best First'.
Re^2: Loading 283600 records (Updated) by remiah (Hermit) on Sep 23, 2012 at 02:28 UTC
Thanks for reply, BrowserUK. I tried and below is the result. s/iter 02_split1 04_unpack 03_split2 01_substr 02_split1 6.34 -- -34% -41% -57% 04_unpack 4.17 52% -- -11% -35% 03_split2 3.71 71% 12% -- -27% 01_substr 2.70 134% 54% 37% -- And test code. I hope there is no silly mistakes. Read more... (2 kB) I thought, seeing your unpack example, if there is a way like this ? This is impossible because unpack returns flat list, though... `open(my $fh, "<", "24length_packed.data" ) or die $!; local $/ = undef; map { push @{ $hash{ $_->[0] } }, [ $_->[1], $_->[2] ] } unpack( '(a10a10a4)*', <$fh>), close $fh;` [download] With large loop, setting value to variable becomes some cost( this is BrowserUK taught me in this thread). So I think if I can avoid to use @rec, unpack and split becomes faster. Is there a good way?	[reply] [d/l] [select]
Re^3: Loading 283600 records (Updated) by BrowserUk (Patriarch) on Sep 23, 2012 at 16:05 UTC
There are no rules -- beyond minimising the number of opcodes called -- that apply in all situations. Try plugging this into your benchmark: `my %hash; while( <> ) { my( $k, @v ) = unpack( 'a10a10a4', $_ ); push @{ $hash{ $k } }, \@v }` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP Neil Armstrong	[reply] [d/l]
Re^4: Loading 283600 records (Updated) by remiah (Hermit) on Sep 23, 2012 at 23:10 UTC
Added 3 test 05 .. unpack, using List::MoreUtils's natatime 06 .. unpack again 07 .. yours Your unpack was faster than mines. This is benchmark results. s/iter 02_split1 05_unpack_natatime 06_unpack_map 04_unpack 03_split2 07_unpack_2 02_split1 6.38 -- -13% -16% -35% -42% -50% 05_unpack_natatime 5.55 15% -- -4% -25% -33% -43% 06_unpack_map 5.34 19% 4% -- -22% -31% -40% 04_unpack 4.18 53% 33% 28% -- -11% -24% 03_split2 3.70 72% 50% 44% 13% -- -14% 07_unpack_2 3.18 100% 74% 68% 31% 16% -- 01_substr 2.70 136% 105% 98% 55% 37% 18% And test code added. Read more... (1233 Bytes) dsheroh told me of in memory SQLite. It's loading time is apparently faster than any of above tests. I will report it later.	[reply] [d/l]
Re^5: Loading 283600 records (substr alias) by Anonymous Monk on Sep 24, 2012 at 03:57 UTC
Re^4: Loading 283600 records (Updated) by remiah (Hermit) on Sep 24, 2012 at 10:31 UTC
Hello, BrowserUK. I posted some benchmarks of sqlite in-memory at the bottom of this thread. It was surprise for me. Please have a look at. and thanks for responding to me. regards.	[reply]

In Section Seekers of Perl Wisdom