Can this script be improved so that it is comparable to the UNIX cut command in performance? If the Perl script can finish in 10 seconds that will be great (50% drop in peformance)! I am happy to take this performance drop because it keeps the script clean and portable (typically i work on UNIX machines so this is not a huge requirement)
I don't think there's any way to speed up the perl approach. (I tried
BrowserUK's idea -- not a rigorous benchmark, but no evidence that it made any difference.) I just have two reactions to your comments:
(1) The unix-style "cut" is portable -- you can find free ports of unix command line utils for ms-windows, and macosx is unix, and "cut" behaves the same everywhere. What more portability do you need?
(2) The reason to choose a perl approach over a common, compiled utility would be that the perl approach makes it a lot easier to provide a lot more flexibility, and the performance hit is a small price to pay for the extra power. I wrote my own perl version of cut years ago and use it all the time (as well as using the original "cut" when it seems quicker), because with perl I get to use a regex for the split, and output the columns in whatever order I choose, and have the input field separator be different from the ouput field separator (e.g. using "\n" to output one field per line), and insert arbitrary quoted strings between columns when this is convenient, and ... anything else I feel like doing, because perl makes it easy to do. Compared to the time it would take to work around the limitations of standard "cut", perl makes things really efficient.
... would you typically consider piping output from cut when the script does not require all the columns for processing? i.e. say the script only needs 3 columns instead of a possible 200 columns then would you pipe the 3 column output from cut instead of spliting the 200 columns in Perl and keeping only the 3 that is required?
Would a bear typically consider defecating in its natural habitat? If processing 3 columns out of 200 were something I intended to do with any regularity, I would probably write and save a perl script that does something like:
open( IN, "cut -d, -f13-15 numbers.csv |" );
while (<IN>) {
chomp;
@row = split /,/; # this is only cols 13, 14, 15 from numbers.csv
# and now, do something
}
(update: naturally, I would have this perl script accept command-line options to specify the field separator and column selections for running the "cut" command, assuming this sort of flexibility were useful.)
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.