I must admit that I'm a little rusty with Perl, I dug up an ancient script. I'm not proud of it... but hey it is from year 2000.

while (<FIL>) { $row = $_; ($pnr, $enamn, $fnamn, $adress, $stad, $telnr, $ar, $lon, $timmar, +$anr) = split("\t", $row); for ($i = 0; $i < $cnt; $i++) { if($pnr_array[$i] eq $pnr) { $duplicate++; $unique = 0; print "\n$pnr is not unique!\n"; } } if ($unique != 0) { $pnr_array[$cnt] = $pnr; $cnt++; # file write code omitted.. } $unique = 1; }
[download]

Now this code is terribly inefficient, it almost takes 30 minutes to run with the data set to remove the rows with non unique $pnr :s.

Today I looked up my old script, to do some serious optimization on it, and the resulting code

while (<FIL>) { $rad = $_; ($pnr, $enamn, $fnamn, $adress, $stad, $telnr, $ar, $lon, $timmar, +$anr) = split("\t", $rad); # print "$pnr"; if(exists $pnr_array{$pnr}) { $duplicate++; $unique = 0; print "\n$pnr är inte unikt!\n"; } if ($unique != 0) { $pnr_array{$pnr} = 0; } $unique = 1; }
[download]
Now this code uses a nasty construct that I don't find beatiful... $pnr_array{$pnr} = 0; is there a better way to do a set in perl?