http://www.perlmonks.org?node_id=1014521


in reply to selecting columns from a tab-separated-values file

Consider slicing a split to get only the elements you need, instead of splitting the entire line, as it benchmarks significantly faster:

use strict; use warnings; use Benchmark qw(cmpthese); my $line = "FIRST\tMIDDLE\tLAST\tSTRNO\tSTRNAME\tCITY\tSTATE\tZIP" . " +\tFOO" x 42; sub trySplit { my @capture = split /\t/, $line; } sub trySplitSlice { my @capture = ( split /\t/, $line )[ 0, 2, 5 ]; } sub trySplitSliceLimit { my @capture = ( split /\t/, $line, 7 )[ 0, 2, 5 ]; } cmpthese( -5, { trySplit => sub { trySplit() }, trySplitSlice => sub { trySplitSlice() }, trySplitSliceLimit => sub { trySplitSliceLimit() } } );

Results:

Rate trySplit trySplitSlice trySplit +SliceLimit trySplit 110337/s -- -46% + -84% trySplitSlice 204730/s 86% -- + -71% trySplitSliceLimit 708158/s 542% 246% + --

Update: Have added choroba's trySplitSliceLimit() option to the benchmarking.

Update II: Thanks to AnomalousMonk, have appended "\tFOO" x 42 to the original string to create a string with 50 tab-delimited fields. This effectively shows the speed increase using trySplitSliceLimit().

Update III: Changed splitting on ' ' to \t. Thanks CountZero.