Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: comparing multiple files

by Laurent_R (Canon)
on Jun 17, 2016 at 17:57 UTC ( [id://1165998]=note: print w/replies, xml ) Need Help??


in reply to comparing multiple files

Not entirely sure if this is appropriate for your specific problem, but I am dealing very regularly with huge files too large to fit in a hash in memory, for the purpose of removing duplicates, comparing two files, etc. I found that the fastest way is very often to sort the files according to the comparison key, using the Unix sort utility, and then read them sequentially. For what I am doing, this is consistently more than one order of magnitude faster than loading the data into a database.

The algorithm is then more complicated than just using a %seen hash, but nothing really hard.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1165998]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-03-28 21:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found