Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: speed up one-line "sort|uniq -c" perl code

by tachyon (Chancellor)
on Apr 10, 2003 at 01:44 UTC ( [id://249514]=note: print w/replies, xml ) Need Help??


in reply to speed up one-line "sort|uniq -c" perl code

Essentially you have:

#!/usr/bin/perl my %h; my @res; # declare outside the loop open F, $ARGV[0] or die $!; while(<F>) { @res = split /\|/; $h{$res[9]}++; } close F; print "$h{$_}\t$_\n" for keys %h;

You may find this faster (maybe not) as we declare the @res array outside the loop which avoids creating an anonymous array to hold the split values, and then destroying it after each iteration of the loop. I suspect this may be happening in your one liner as even on a dog slow old PII 233 I have similar scripts that will parse megabyte squid/httpd log files in a few seconds - definitely not minutes.

Alternatively slurping the file into memory and using a regex to process it in one pass may be faster it you have a major disk IO issue as the choke rather than CPU cycles. This is a spend more memory to get more speed approach. It is typically much faster to read in large chunks of data than iterate over a file linewise.

#!/usr/bin/perl { local $/; open F, $ARGV[0] or die $!; $all = <F>; close F; @ips = $all =~ m/^(?:[^\|]+\|){9}([^\|]+)/gm; undef $all; # try to reclaim the memory } my %h; $h{$_}++ for @ips; print "$h{$_}\t$_\n" for keys %h;

If neither of these work code it in C or get better hardware or just make a cup of coffee while it runs or nice it up to the max so it takes over the entire system.

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://249514]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2025-06-13 23:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.