Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: how to speed up dupe checking of arrays

by wind (Priest)
on Jul 31, 2007 at 09:57 UTC ( #629780=note: print w/ replies, xml ) Need Help??


in reply to how to speed up dupe checking of arrays

Most likely you're going to simply be throttled by IO speed. However, your above code could in theory be simplified by limiting the split to only 3 parts, and by adding the dup check to the while loop. Assuming that dup count is all you really care about.

if ($file =~ $spec_text){ my $file_date = (split(/\./,$file))[3]; open(IN, '<', $file) or die("open failed: $!"); my $count_uniq = 0; my %seen; while (<IN>) { chomp; my ($ele0, $ele1, undef) = split ';', $_, 3; $count_uniq++ if !$seen{"$ele0;$ele1;$file_date"}++; } print "$.\n"; # Total number of lines. print "$count_uniq\n"; close(IN); }
- Miller


Comment on Re: how to speed up dupe checking of arrays
Download Code
Re^2: how to speed up dupe checking of arrays
by ultibuzz (Monk) on Jul 31, 2007 at 10:25 UTC

    i need them in an array so i adjusted ur code like this

    my @rows; my %seen; while (<IN>) { chomp; my ($ele0, $ele1, undef) = split ';', $_, 3; push @rows,"$ele0;$ele1;$file_date" if !$seen{"$ele0;$ele1 +;$file_date"}++; } close(IN);

    and waht shoud i say AWSOME, from 203 seconds down do around 11 seconds,great so no over hours in office needed ;)
    thx alot.

    kd ultibuzz

      you already have them in array.
      $seen{"$ele0;$ele1;$file_data"}++;
      then if you need the data you can, for example:
      foreach (keys %seen) { .... }
      and the value of the hash is the number of times the string is repeated

      Oha

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://629780]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (11)
As of 2015-01-30 18:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My top resolution in 2015 is:

















    Results (253 votes), past polls