Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: cut vs split (suggestions)

by tlm (Prior)
on Apr 17, 2005 at 02:51 UTC ( #448563=note: print w/replies, xml ) Need Help??

in reply to cut vs split (suggestions)

I'd be very surprised if a pure-Perl script beat a native utility like cut.

If cutting out of the columns of these large files is a bottleneck, your options include:

  • As you mentioned, piping the output of cut to your script;
  • Running cut as a pipe from within your script:
    open my $cutter, "-|", 'cut', '-d,', '-f"1-15"', $mongo or die "Fork failed: $!\n"; go_to_town( $_ ) while <$cutter>; close $cutter;
    See perlipc.
  • Write a Perl extension module (XS, Inline::C, Swig) to extract the rows.

I like the second option best.

the lowliest monk

Replies are listed 'Best First'.
Re^2: cut vs split (suggestions)
by BrowserUk (Pope) on Apr 17, 2005 at 04:52 UTC

    Once you read each line of output from cut via the piped open, you are still going to have to split it to an array in order to utilise the fields, so I think most if not all the performance advantage of using cut will be lost, though spitting 15 fields cut from 200 rather than all 200 may help.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco.
    Rule 1 has a caveat! -- Who broke the cabal?

      The internal pipe approach is about 1.5X faster than the pure Perl approach (though still a far cry from cut):

      % time perl -le 'open IN, q( cut -d, -f"1-15" numbers.csv| ); \ print join ",", ( chomp and @F = split /,/ ) while <IN>' > /dev/null 19.49s user 0.00s system 96% cpu 20.289 total

      Update: But keep in mind that the numbers above are for a relatively fast cut command. The improvement with sk's cut will be more modest; it'd be interesting to see the actual numbers.

      the lowliest monk

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://448563]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2018-05-25 17:37 GMT
Find Nodes?
    Voting Booth?