http://www.perlmonks.org?node_id=995154


in reply to Re: Loading 283600 records (Updated)
in thread Loading 283600 records (WordNet)

Thanks for reply, BrowserUK.

I tried and below is the result.

          s/iter 02_split1 04_unpack 03_split2 01_substr
02_split1   6.34        --      -34%      -41%      -57%
04_unpack   4.17       52%        --      -11%      -35%
03_split2   3.71       71%       12%        --      -27%
01_substr   2.70      134%       54%       37%        --
And test code. I hope there is no silly mistakes.
#!/usr/bin/perl use strict; use warnings; use Time::HiRes; use Benchmark qw/cmpthese/; my $href; sub test1{ $href={}; open(my $fh, "<", "04.txt") or die $!; while(<$fh>){ chomp; push @{ $href->{ substr($_,0,10)} }, [ substr($_,10,10), subst +r($_,20)]; } close $fh; } sub test2{ my @rec; $href={}; open(my $fh, "<", "04.txt") or die $!; push @{ $href->{ $rec[0] } }, [ @rec[ 1, 2 ] ] while @rec = split '(?<=-[a-z])', <$fh>; close $fh; } sub test3{ #04-1.txt, with delimiter '|' my @rec; $href={}; open(my $fh, "<", "04-1.txt") or die $!; push @{ $href->{ $rec[0]} }, [ @rec[1, 2] ] while @rec = split /\|/, <$fh>; close $fh; } sub test4{ #with unpack my @rec; $href={}; open(my $fh, "<", "04.txt") or die $!; @rec = unpack( 'a10a10a4', $_ ), push @{ $href->{ $rec[0] } }, [ @rec[ 1, 2 ] ] while <$fh>; close $fh; } my %tests = ( '01_substr' => \&test1, '02_split1' => \&test2, '03_split2' => \&test3, '04_unpack' => \&test4, ); cmpthese( -20, #for 20 cpu secs \%tests );
I thought, seeing your unpack example, if there is a way like this ? This is impossible because unpack returns flat list, though...
open(my $fh, "<", "24length_packed.data" ) or die $!; local $/ = undef; map { push @{ $hash{ $_->[0] } }, [ $_->[1], $_->[2] ] } unpack( '(a10a10a4)*', <$fh>), close $fh;
With large loop, setting value to variable becomes some cost( this is BrowserUK taught me in this thread). So I think if I can avoid to use @rec, unpack and split becomes faster. Is there a good way?