Re^2: Loading 283600 records (Updated)

in reply to Re: Loading 283600 records (Updated)
in thread Loading 283600 records (WordNet)

Thanks for reply, BrowserUK.

I tried and below is the result.

          s/iter 02_split1 04_unpack 03_split2 01_substr
02_split1   6.34        --      -34%      -41%      -57%
04_unpack   4.17       52%        --      -11%      -35%
03_split2   3.71       71%       12%        --      -27%
01_substr   2.70      134%       54%       37%        --

And test code. I hope there is no silly mistakes.

#!/usr/bin/perl
use strict; use warnings;
use Time::HiRes;
use Benchmark qw/cmpthese/;

my $href;

sub test1{
    $href={};
    open(my $fh, "<", "04.txt") or die $!;
    while(<$fh>){
        chomp;
        push @{ $href->{ substr($_,0,10)} }, [ substr($_,10,10), subst
+r($_,20)];
    }
    close $fh;
}
sub test2{
    my @rec; $href={};
    open(my $fh, "<", "04.txt") or die $!;
    push @{ $href->{ $rec[0] } }, [ @rec[ 1, 2 ] ]
        while @rec = split '(?<=-[a-z])', <$fh>; 
    close $fh;
}
sub test3{ #04-1.txt, with delimiter '|'
    my @rec; $href={};
    open(my $fh, "<", "04-1.txt") or die $!;
    push @{ $href->{ $rec[0]} }, [ @rec[1, 2] ]
        while @rec = split /\|/, <$fh>;
    close $fh;
}
sub test4{ #with unpack
    my @rec; $href={};
    open(my $fh, "<", "04.txt") or die $!;
    @rec = unpack( 'a10a10a4', $_ ),
        push @{ $href->{ $rec[0] } }, [ @rec[ 1, 2 ] ]
            while <$fh>;
    close $fh;
}

my %tests = (
    '01_substr' => \&test1,
    '02_split1' => \&test2,
    '03_split2' => \&test3,
    '04_unpack' => \&test4,
); 

cmpthese(
    -20, #for 20 cpu secs
    \%tests
);
[download]

I thought, seeing your unpack example, if there is a way like this ? This is impossible because unpack returns flat list, though...

open(my $fh, "<", "24length_packed.data" ) or die $!;
local $/ = undef;
map {
    push @{ $hash{ $_->[0] } }, [ $_->[1], $_->[2] ]
     } unpack( '(a10a10a4)*', <$fh>),
close $fh;
[download]

With large loop, setting value to variable becomes some cost( this is BrowserUK taught me in this thread). So I think if I can avoid to use @rec, unpack and split becomes faster. Is there a good way?

In Section Seekers of Perl Wisdom