Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Performance challenges

by dragonchild (Archbishop)
on Mar 22, 2006 at 12:21 UTC ( [id://538488]=note: print w/replies, xml ) Need Help??


in reply to Performance challenges

If the regex solution provided by Melly isn't as fast as you need, you might try:
open( IN, '<', "Your_data_here" ); open( GOOD, '>', "Good_file_here" ); open( BAD, '>', "Bad_file_here" ); while (<IN>) { my @row = split "\t", $_; if ( length($row[14]) == 5 && length($row[15]) == 7 ) { print GOOD $_; next; } print BAD $_; } close BAD; close GOOD; close IN;
Note: awk processing a million rows/minute probably isn't that bad. I'm not sure Perl is going to be much faster. This is a very I/O-bound activity.

My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

Replies are listed 'Best First'.
Re^2: Performance challenges
by Eimi Metamorphoumai (Deacon) on Mar 22, 2006 at 18:24 UTC
    One suggestion: if each record has more than 16 fields, you might find slightly better performance with
    my @row = split /\t/, $_, 17;
    which tells perl to split into at most 17 fields (0 to 15, leaving the trailing data in 16).
Re^2: Performance challenges
by Anonymous Monk on Mar 22, 2006 at 13:31 UTC
    Thanks very much folks! Much appreciate the fast response.

    dragonchild: I had a script similar to the one you wrote here; but I was not sure if that was the most optimal one. Sounds like it is. Thanks again! :) -Kris

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://538488]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (4)
As of 2024-04-24 05:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found