Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: Removing duplicate entries in a file which has a time stamp on each line

by Random_Walk (Prior)
on Jan 18, 2006 at 21:37 UTC ( #524071=note: print w/replies, xml ) Need Help??

in reply to Removing duplicate entries in a file which has a time stamp on each line

The easiest way is probably to split your line on space, tab or comma depending on what your file uses, take all the fields beyond the date and time and use them as a hash key join will glue them back together for you. If the key already exists you have seen this line before and can discard it, it not write the line out, create the key and move onto the next line.

There are a few optimistaions depending on your need for speed, code simplicity or fun and games. The first is to only split the line into three, Date, Time, Rest and the you do not have to join back up. another is to test if the hash key exists at the same time you create it:

if ($hash{$key}++) { # key already seen }else{ # key not yet seen }

Have a go and post your code if you have any problems


Pereant, qui ante nos nostra dixerunt!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://524071]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2016-10-23 15:09 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (301 votes). Check out past polls.