Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^2: cut vs split (suggestions)

by tlm (Prior)
on Apr 17, 2005 at 03:33 UTC ( #448567=note: print w/ replies, xml ) Need Help??


in reply to Re: cut vs split (suggestions)
in thread cut vs split (suggestions)

Here are the numbers on my machine (first line describes the input used):

% perl -le 'BEGIN{$,=","} print map int rand 1000, 1..25 for 1..500_00 +0' \ > numbers.csv % time cut -d, -f"1-15" numbers.csv > /dev/null 0.80s user 0.05s system 98% cpu 0.859 total % time perl -lanF, -e 'print join ",", @F[0..14];' numbers.csv > /dev/ +null 31.54s user 0.06s system 97% cpu 32.462 total % time perl -lanF, -e 'BEGIN{ $,=","} print @F[0..14];' numbers.csv > +/dev/null 31.14s user 0.05s system 99% cpu 31.463 total
(I guess I have much faster cut than sk's...)

the lowliest monk


Comment on Re^2: cut vs split (suggestions)
Download Code
Re^3: cut vs split (suggestions)
by BrowserUk (Pope) on Apr 17, 2005 at 03:56 UTC

    Faster than mine also:

    [ 4:40:33.95] P:\test>cut -d, -f 1-15 data\25x500000.csv >nul [ 4:41:13.59] P:\test> [ 4:42:48.34] P:\test>perl -lanF, -e "BEGIN{ $,=','} print @F[0..14];" + data\25x500000.csv >nul [ 4:43:25.60] P:\test>

    40 seconds for cut versus 37 for Perl.

    That said, that time for your cut seems almost to good to be true. You are sure that cut can't somehow detect that it is writing to the null device and simply skip it--like perl sort detects a null context and skips?

    It's probably just a very well optimised, time-honed Unix utility versus a bad Win32 emulation, but 0.80s for 500,000 records is remarkable enough to make me check.

    I just remembered something I discovered a long time ago. The Win32 nul device is slower than writing to a file!?

    [ 4:53:24.51] P:\test>cut -d, -f 1-15 data\25x500000.csv >junk [ 4:53:38.01] P:\test>

    Actually writing the file cuts the 40 seconds to 14. Go figure.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco.
    Rule 1 has a caveat! -- Who broke the cabal?

      That said, that time for your cut seems almost to good to be true. You are sure that cut can't somehow detect that it is writing to the null device and simply skip it--like perl sort detects a null context and skips?

      As it happens, in the very first run of cut I tried, I sent the output to a file; and yes, I was pleasantly surprised to see how fast this cut was. But, what utility could there be for the optimization you describe? If there is one, I sure can't think of it. And why should a no-op take 0.9s?

      Anyway, FWIW:

      % time cut -d, -f"1-15" numbers.csv > out.csv 0.80s user 0.11s system 100% cpu 0.906 total % wc out.csv 500000 500000 29174488 out.csv % head -1 out.csv 169,970,983,721,411,426,262,255,484,174,389,651,175,975,763 % tail -1 out.csv 936,347,232,520,436,359,208,737,788,226,731,497,755,746,812

      the lowliest monk

        But, what utility could there be for the optimization you describe?

        None, but then there is no point in sorting in a void context either, but I've been bitten enough times by that when benchmarking (usually publically!), that extreme differences make me suspicious. I was just suprised by the magniitude of the difference you were showing.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco.
        Rule 1 has a caveat! -- Who broke the cabal?
      I was curious about the auto-detect in cut. So I tested it in on my machine again

      [sk]% time cut -d, -f"1-15" numbers.csv > junk 5.630u 0.260s 0:06.12 96.2% [sk]% time cut -d, -f"1-15" numbers.csv > /dev/null 5.620u 0.030s 0:05.65 100.0%
      I guess Pustular must be on a really fast machine :) -SK

        Don't I wish! I just have a faster cut; my numbers for the Perl scripts are comparable to yours.

        the lowliest monk

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://448567]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2014-10-22 05:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (112 votes), past polls