Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Can this script be improved so that it is comparable to the UNIX cut command in performance? If the Perl script can finish in 10 seconds that will be great (50% drop in peformance)! I am happy to take this performance drop because it keeps the script clean and portable (typically i work on UNIX machines so this is not a huge requirement)
I don't think there's any way to speed up the perl approach. (I tried BrowserUK's idea -- not a rigorous benchmark, but no evidence that it made any difference.) I just have two reactions to your comments:

(1) The unix-style "cut" is portable -- you can find free ports of unix command line utils for ms-windows, and macosx is unix, and "cut" behaves the same everywhere. What more portability do you need?

(2) The reason to choose a perl approach over a common, compiled utility would be that the perl approach makes it a lot easier to provide a lot more flexibility, and the performance hit is a small price to pay for the extra power. I wrote my own perl version of cut years ago and use it all the time (as well as using the original "cut" when it seems quicker), because with perl I get to use a regex for the split, and output the columns in whatever order I choose, and have the input field separator be different from the ouput field separator (e.g. using "\n" to output one field per line), and insert arbitrary quoted strings between columns when this is convenient, and ... anything else I feel like doing, because perl makes it easy to do. Compared to the time it would take to work around the limitations of standard "cut", perl makes things really efficient.

... would you typically consider piping output from cut when the script does not require all the columns for processing? i.e. say the script only needs 3 columns instead of a possible 200 columns then would you pipe the 3 column output from cut instead of spliting the 200 columns in Perl and keeping only the 3 that is required?
Would a bear typically consider defecating in its natural habitat? If processing 3 columns out of 200 were something I intended to do with any regularity, I would probably write and save a perl script that does something like:
open( IN, "cut -d, -f13-15 numbers.csv |" ); while (<IN>) { chomp; @row = split /,/; # this is only cols 13, 14, 15 from numbers.csv # and now, do something }
(update: naturally, I would have this perl script accept command-line options to specify the field separator and column selections for running the "cut" command, assuming this sort of flexibility were useful.)

In reply to Re: cut vs split (suggestions) by graff
in thread cut vs split (suggestions) by sk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2024-04-19 18:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found