Re: Removing duplicate entries in a file which has a time stamp on each line

by Random_Walk (Prior)
on Jan 18, 2006 at 21:37 UTC

in reply to Removing duplicate entries in a file which has a time stamp on each line

The easiest way is probably to split your line on space, tab or comma depending on what your file uses, take all the fields beyond the date and time and use them as a hash key join will glue them back together for you. If the key already exists you have seen this line before and can discard it, it not write the line out, create the key and move onto the next line.

There are a few optimistaions depending on your need for speed, code simplicity or fun and games. The first is to only split the line into three, Date, Time, Rest and the you do not have to join back up. another is to test if the hash key exists at the same time you create it:

if ($hash{$key}++) { # key already seen }else{ # key not yet seen }

Have a go and post your code if you have any problems


