Re: cut vs split (suggestions)

I'd be very surprised if a pure-Perl script beat a native utility like cut.

If cutting out of the columns of these large files is a bottleneck, your options include:

Running cut as a pipe from within your script:

open my $cutter, "-|", 'cut', '-d,', '-f"1-15"', $mongo
  or die "Fork failed: $!\n";
go_to_town( $_ ) while <$cutter>;
close $cutter;
[download]

See perlipc.

I like the second option best.

the lowliest monk

Comment on Re: cut vs split (suggestions) Download Code

Replies are listed 'Best First'.
Re^2: cut vs split (suggestions) by BrowserUk (Patriarch) on Apr 17, 2005 at 04:52 UTC
Once you read each line of output from cut via the piped open, you are still going to have to split it to an array in order to utilise the fields, so I think most if not all the performance advantage of using cut will be lost, though spitting 15 fields cut from 200 rather than all 200 may help. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. Rule 1 has a caveat! -- Who broke the cabal?	[reply] [d/l]
Re^3: cut vs split (suggestions) by tlm (Prior) on Apr 17, 2005 at 05:13 UTC
The internal pipe approach is about 1.5X faster than the pure Perl approach (though still a far cry from `cut`): `% time perl -le 'open IN, q( cut -d, -f"1-15" numbers.csv\| ); \ print join ",", ( chomp and @F = split /,/ ) while <IN>' > /dev/null 19.49s user 0.00s system 96% cpu 20.289 total` [download] Update: But keep in mind that the numbers above are for a relatively fast `cut` command. The improvement with sk's `cut` will be more modest; it'd be interesting to see the actual numbers. the lowliest monk	[reply] [d/l]