Re: how to speed up dupe checking of arrays

in reply to how to speed up dupe checking of arrays

Most likely you're going to simply be throttled by IO speed. However, your above code could in theory be simplified by limiting the split to only 3 parts, and by adding the dup check to the while loop. Assuming that dup count is all you really care about.

if ($file =~ $spec_text){
    my $file_date = (split(/\./,$file))[3];
    open(IN, '<', $file) or die("open failed: $!");
    my $count_uniq = 0;
    my %seen;
    while (<IN>) {
        chomp;
        my ($ele0, $ele1, undef) = split ';', $_, 3;
        $count_uniq++ if !$seen{"$ele0;$ele1;$file_date"}++;
    }
    print "$.\n"; # Total number of lines.
    print "$count_uniq\n";
    close(IN);
}
[download]

- Miller

In Section Seekers of Perl Wisdom