Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Delete duplicate data in file

by ptum (Priest)
on Nov 21, 2005 at 04:51 UTC ( #510362=note: print w/ replies, xml ) Need Help??


in reply to Delete duplicate data in file

I don't really like having to depend on the files being sorted. One alternative way to remove duplicate data is to use a hash to temporarily hold your data. You can read in the data from the files, place it in a hash, and then (eventually) write it back out again. Since hashes depend on a unique key, you'll overwrite any prior duplicate data rows in the hash and end up with only one copy of each unique element.


Comment on Re: Delete duplicate data in file
Re^2: Delete duplicate data in file
by pg (Canon) on Nov 21, 2005 at 05:05 UTC

    It is not whether you "depend on the files being sorted", but whether it is a fact that the file is sorted.

    If it is (for example it could be some sort of log), to hold the entire file in momery is then a waste.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://510362]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (8)
As of 2015-07-07 02:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (86 votes), past polls