http://www.perlmonks.org?node_id=448726


in reply to cut vs split (suggestions)

Lots of examples in this thread, but everyone seems to be using a split and join of some sort. I doubt cut will do this much work just to print out the first 15 columns.

When looking for speed the first thing I usually look to is removing the need for any regexps where index and substr will do just as well! So why not just look for the 15th ',' and print everything before that?

while (<>) { $col = 15; $index = index($_, ',', $index+1) while ($col--); print substr($_, 0, $index), $/; $index = 0; }

This doesn't do any unnecesary string manipulation, and avoids expensive regexps as well. It could easily be extended to not start at the first column.

This proved faster than some of the examples above that I tried out (I didn't try them all).

$ time perl cut.pl numbers.csv > /dev/null real 0m5.577s user 0m4.792s sys 0m0.055s $ time cut -d, -f"1-15" numbers.csv > /dev/null real 0m1.081s user 0m0.866s sys 0m0.042s

- Cees

Replies are listed 'Best First'.
Re^2: cut vs split (and on usefullness of cat, too:-))
by Mabooka-Mabooka (Sexton) on Apr 18, 2005 at 16:13 UTC
    I am Perl-illiterate, but:

    how is 5 faster than 1?
    And how possibly "index" / "substring" could be faster than smth. like:
    char* cut(char* pStr, int col, char* delim=",") { // not a real-life code, all good assumptions here: int n = 0; while(*pStr++){ if(*pStr==*delim){ if(++n == col){ return until_next_one(++pStr, delim); } } } }
    --?
    Sorry if I missed the point.

      I wasn't implying that my perl version was faster than the C version. I think it is save to say that for every perl program, there is a C program that can perform the same task in less time.

      I was stating that using index and substr where you can is almost always faster than using a regexp, so my index and substr version was faster than the join and split versions listed above. So I was comparing different perl implementations.

      I guess I probably should have timed one of the split/join examples and included that, but I didn't know which one was the fastest. So I included the timing based on the C version of cut, which gives a baseline for anyone to compare against.

      - Cees