Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
Don't ask to ask, just ask
 
PerlMonks  

Re^3: cut vs split (suggestions)

by BrowserUk (Pope)
on Apr 17, 2005 at 06:01 UTC ( #448598=note: print w/ replies, xml ) Need Help??


in reply to Re^2: cut vs split (suggestions)
in thread cut vs split (suggestions)

I don't think it is split (which also used the regex engine), so much as it's the assignment to the (global) array.

Avoiding that gets the time down from 37 seconds to just under 7 on my system.

[ 6:55:11.07] P:\test>perl -lne"BEGIN{$,=','} print+(split',',<>)[0..1 +4] " data\25x500000.csv >junk [ 6:55:17.90] P:\test>

Of course, that's only really useful if you only want to print them stright out again, but I guess it gets closer to being comparable with what cut does.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.
Rule 1 has a caveat! -- Who broke the cabal?


Comment on Re^3: cut vs split (suggestions)
Download Code
Re^4: cut vs split (suggestions)
by sk (Curate) on Apr 17, 2005 at 07:01 UTC
    Very interesting! Thanks for the new idea!

    I lost my server connection for some reason and so I tested this on my laptop and I do see a very good improvement with the your modification.

    update: corrected <> with $_ per pijll post

    C:\>perl -lne "BEGIN{$,=','} print+(split',',$_)[0..14] " > junk
    this finishes in about 14 seconds.... corrected timing

    C:\>perl -lanF, -e "BEGIN{ $,=\",\"} print @F[0..14];" numbers.csv > j +unk
    this takes about 18 seconds

    I don't have a timing utility in Windows so the times are just wallclock times.

    I guess windows is faster because the process run at 100% CPU (or whatever is required i guess?). On the UNIX servers the process might be more time-shared?

    My laptop is 1.6G Centrino/1GB Ram/perl, v5.6.1

    cheers

    SK Update: Thanks pijll, the time it takes to run your version of the code is almost same as the one that uses -n.

      You are using both the -n switch and <> in the first line! This means you lose half of your lines...

      Anyway: -n does an unnecessary chomp on every line, so remove that; and use a limit on split: it doesn't actually need to split all 25 fields:

      perl -le 'BEGIN{$,=","} print+(split",",$_,16)[0..14]for <>' numbers.c +sv
      Update: But for<> reads all lines in at ones; you may not want that with large files, so use while <> instead.
Re^4: cut vs split (suggestions)
by tlm (Prior) on Apr 17, 2005 at 07:42 UTC

    Now it is my machine that is the slowpoke; still, for me, the internal pipe continues to fare best:

    % time perl -lne 'BEGIN{ $,=","}; print+(split ",")[0..14]' numbers.cs +v \ > /dev/null 12.65s user 0.01s system 98% cpu 12.890 total % time perl -le 'open IN, q(cut -d, -f"1-15" numbers.csv|); \ print join ",", ( chomp and split /,/ ) while <IN>' > dev/null 8.17s user 0.01s system 90% cpu 9.070 total

    the lowliest monk

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://448598]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2014-04-19 00:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (473 votes), past polls