Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: how to speed up dupe checking of arrays

by wind (Priest)
on Jul 31, 2007 at 09:57 UTC ( #629780=note: print w/ replies, xml ) Need Help??


in reply to how to speed up dupe checking of arrays

Most likely you're going to simply be throttled by IO speed. However, your above code could in theory be simplified by limiting the split to only 3 parts, and by adding the dup check to the while loop. Assuming that dup count is all you really care about.

if ($file =~ $spec_text){ my $file_date = (split(/\./,$file))[3]; open(IN, '<', $file) or die("open failed: $!"); my $count_uniq = 0; my %seen; while (<IN>) { chomp; my ($ele0, $ele1, undef) = split ';', $_, 3; $count_uniq++ if !$seen{"$ele0;$ele1;$file_date"}++; } print "$.\n"; # Total number of lines. print "$count_uniq\n"; close(IN); }
- Miller


Comment on Re: how to speed up dupe checking of arrays
Download Code
Replies are listed 'Best First'.
Re^2: how to speed up dupe checking of arrays
by ultibuzz (Monk) on Jul 31, 2007 at 10:25 UTC

    i need them in an array so i adjusted ur code like this

    my @rows; my %seen; while (<IN>) { chomp; my ($ele0, $ele1, undef) = split ';', $_, 3; push @rows,"$ele0;$ele1;$file_date" if !$seen{"$ele0;$ele1 +;$file_date"}++; } close(IN);

    and waht shoud i say AWSOME, from 203 seconds down do around 11 seconds,great so no over hours in office needed ;)
    thx alot.

    kd ultibuzz

      you already have them in array.
      $seen{"$ele0;$ele1;$file_data"}++;
      then if you need the data you can, for example:
      foreach (keys %seen) { .... }
      and the value of the hash is the number of times the string is repeated

      Oha

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://629780]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (9)
As of 2015-08-02 16:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The oldest computer book still on my shelves (or on my digital media) is ...













    Results (2 votes), past polls