Re: selecting columns from a tab-separated-values file

in reply to selecting columns from a tab-separated-values file

Consider slicing a split to get only the elements you need, instead of splitting the entire line, as it benchmarks significantly faster:

use strict;
use warnings;
use Benchmark qw(cmpthese);

my $line = "FIRST\tMIDDLE\tLAST\tSTRNO\tSTRNAME\tCITY\tSTATE\tZIP" . "
+\tFOO" x 42;

sub trySplit {
    my @capture = split /\t/, $line;
}

sub trySplitSlice {
    my @capture = ( split /\t/, $line )[ 0, 2, 5 ];
}

sub trySplitSliceLimit {
    my @capture = ( split /\t/, $line, 7 )[ 0, 2, 5 ];
}

cmpthese(
    -5,
    {
        trySplit           => sub { trySplit() },
        trySplitSlice      => sub { trySplitSlice() },
        trySplitSliceLimit => sub { trySplitSliceLimit() }
    }
);
[download]

Results:

                       Rate         trySplit    trySplitSlice trySplit
+SliceLimit
trySplit           110337/s               --             -46%         
+      -84%
trySplitSlice      204730/s              86%               --         
+      -71%
trySplitSliceLimit 708158/s             542%             246%         
+        --
[download]

Update: Have added choroba's trySplitSliceLimit() option to the benchmarking.

Update II: Thanks to AnomalousMonk, have appended "\tFOO" x 42 to the original string to create a string with 50 tab-delimited fields. This effectively shows the speed increase using trySplitSliceLimit().

Update III: Changed splitting on ' ' to \t. Thanks CountZero.

In Section Seekers of Perl Wisdom