Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: cut vs split (suggestions)

by graff (Chancellor)
on Apr 17, 2005 at 03:33 UTC ( #448566=note: print w/ replies, xml ) Need Help??


in reply to cut vs split (suggestions)

Can this script be improved so that it is comparable to the UNIX cut command in performance? If the Perl script can finish in 10 seconds that will be great (50% drop in peformance)! I am happy to take this performance drop because it keeps the script clean and portable (typically i work on UNIX machines so this is not a huge requirement)
I don't think there's any way to speed up the perl approach. (I tried BrowserUK's idea -- not a rigorous benchmark, but no evidence that it made any difference.) I just have two reactions to your comments:

(1) The unix-style "cut" is portable -- you can find free ports of unix command line utils for ms-windows, and macosx is unix, and "cut" behaves the same everywhere. What more portability do you need?

(2) The reason to choose a perl approach over a common, compiled utility would be that the perl approach makes it a lot easier to provide a lot more flexibility, and the performance hit is a small price to pay for the extra power. I wrote my own perl version of cut years ago and use it all the time (as well as using the original "cut" when it seems quicker), because with perl I get to use a regex for the split, and output the columns in whatever order I choose, and have the input field separator be different from the ouput field separator (e.g. using "\n" to output one field per line), and insert arbitrary quoted strings between columns when this is convenient, and ... anything else I feel like doing, because perl makes it easy to do. Compared to the time it would take to work around the limitations of standard "cut", perl makes things really efficient.

... would you typically consider piping output from cut when the script does not require all the columns for processing? i.e. say the script only needs 3 columns instead of a possible 200 columns then would you pipe the 3 column output from cut instead of spliting the 200 columns in Perl and keeping only the 3 that is required?
Would a bear typically consider defecating in its natural habitat? If processing 3 columns out of 200 were something I intended to do with any regularity, I would probably write and save a perl script that does something like:
open( IN, "cut -d, -f13-15 numbers.csv |" ); while (<IN>) { chomp; @row = split /,/; # this is only cols 13, 14, 15 from numbers.csv # and now, do something }
(update: naturally, I would have this perl script accept command-line options to specify the field separator and column selections for running the "cut" command, assuming this sort of flexibility were useful.)


Comment on Re: cut vs split (suggestions)
Download Code
Re^2: cut vs split (suggestions)
by sk (Curate) on Apr 17, 2005 at 04:05 UTC
    Thanks Everyone!

    Sorry I was not very clear on couple of things. When I meant piping, yes I had the pipe inside my open in mind. I did not realize that I used a command line example where pipe means  cut -d, ... | perl ... :)

    to graff's comment... I used the word portability very loosely and it is my fault... I sometimes write small utilites for my co-workers and wanted these utilities to work on other OSes...even though i could find a version of cut for windows (actually i did not know about this until after reading your post), i will not be able to install it on other's machines for policy reasons :(

    I did not think about the print idea (even though it does not make a diff in the runtime), I feel such small improvements can help!!!

    I agree with you all that it definitely worth the effort to have the task done in Perl as it gives enormous amount of flexibility.

    Thanks again all for your wonderful comments!

    cheers

    SK

      ... even though i could find a version of cut for windows ... i will not be able to install it on other's machines for policy reasons :(
      Um, do you mean that you can give your friends perl scripts to run, but you can't give them a "cut.exe" file? That's some wierd policy.
        Yes, they have activestate pre-installed :) -SK
Re^2: cut vs split (suggestions)
by wazoox (Prior) on Apr 17, 2005 at 14:25 UTC
    The unix-style "cut" is portable -- you can find free ports of unix command line utils for ms-windows, and macosx is unix, and "cut" behaves the same everywhere. What more portability do you need?
    I just had a silly idea : what about using something like Inline::C to integrate the part of the 'cut' utility you need to the script?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://448566]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2014-12-19 04:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (70 votes), past polls