http://www.perlmonks.org?node_id=448563


in reply to cut vs split (suggestions)

I'd be very surprised if a pure-Perl script beat a native utility like cut.

If cutting out of the columns of these large files is a bottleneck, your options include:

I like the second option best.

the lowliest monk

Replies are listed 'Best First'.
Re^2: cut vs split (suggestions)
by BrowserUk (Patriarch) on Apr 17, 2005 at 04:52 UTC

    Once you read each line of output from cut via the piped open, you are still going to have to split it to an array in order to utilise the fields, so I think most if not all the performance advantage of using cut will be lost, though spitting 15 fields cut from 200 rather than all 200 may help.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco.
    Rule 1 has a caveat! -- Who broke the cabal?

      The internal pipe approach is about 1.5X faster than the pure Perl approach (though still a far cry from cut):

      % time perl -le 'open IN, q( cut -d, -f"1-15" numbers.csv| ); \ print join ",", ( chomp and @F = split /,/ ) while <IN>' > /dev/null 19.49s user 0.00s system 96% cpu 20.289 total

      Update: But keep in mind that the numbers above are for a relatively fast cut command. The improvement with sk's cut will be more modest; it'd be interesting to see the actual numbers.

      the lowliest monk