Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re: Removing duplicate entries in a file which has a time stamp on each line

by Random_Walk (Prior)
on Jan 18, 2006 at 21:37 UTC ( #524071=note: print w/replies, xml ) Need Help??

in reply to Removing duplicate entries in a file which has a time stamp on each line

The easiest way is probably to split your line on space, tab or comma depending on what your file uses, take all the fields beyond the date and time and use them as a hash key join will glue them back together for you. If the key already exists you have seen this line before and can discard it, it not write the line out, create the key and move onto the next line.

There are a few optimistaions depending on your need for speed, code simplicity or fun and games. The first is to only split the line into three, Date, Time, Rest and the you do not have to join back up. another is to test if the hash key exists at the same time you create it:

if ($hash{$key}++) { # key already seen }else{ # key not yet seen }

Have a go and post your code if you have any problems


Pereant, qui ante nos nostra dixerunt!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://524071]
[Lady_Aleena]: Oops, didn't read the L<text|url> right.
[atcroft]: Lady_Aleena: I considered it a violation of the principle of least surprise, but before I tried sending a message to the particular author (and perhaps make myself appear more foolish than normal), I wanted to find out if there was a more site-agnostic
[atcroft]: way (since I am guessing that MetaCPAN may be the author's preferred CPAN interface).

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (4)
As of 2017-05-27 03:58 GMT
Find Nodes?
    Voting Booth?