Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Removing duplicate entries in a file which has a time stamp on each line

by Random_Walk (Parson)
on Jan 18, 2006 at 21:37 UTC ( #524071=note: print w/ replies, xml ) Need Help??


in reply to Removing duplicate entries in a file which has a time stamp on each line

The easiest way is probably to split your line on space, tab or comma depending on what your file uses, take all the fields beyond the date and time and use them as a hash key join will glue them back together for you. If the key already exists you have seen this line before and can discard it, it not write the line out, create the key and move onto the next line.

There are a few optimistaions depending on your need for speed, code simplicity or fun and games. The first is to only split the line into three, Date, Time, Rest and the you do not have to join back up. another is to test if the hash key exists at the same time you create it:

if ($hash{$key}++) { # key already seen }else{ # key not yet seen }

Have a go and post your code if you have any problems

Cheers,
R.

Pereant, qui ante nos nostra dixerunt!


Comment on Re: Removing duplicate entries in a file which has a time stamp on each line
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://524071]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (9)
As of 2014-08-01 10:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (5 votes), past polls