Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Removing duplicates in large files

by lestrrat (Deacon)
on Jan 30, 2004 at 20:29 UTC ( [id://325381]=note: print w/replies, xml ) Need Help??


in reply to Removing duplicates in large files

I suppose that if you must use Perl for this, you could use DB_File (or other *DB_File modules), and just keep chugging the email address to a file. Since this being a hash, it would weed out duplicates.

some code fragments...

use DB_File; my %hash = tie(...., 'DB_File'....); my $fh = something_to_open_the_file(...); while (my $addr = <$fh>) { chomp($addr); $hash{lc($addr)} = 1; }

Then you can open that db that DB_File created, and dump it to a file, whatever.

However, if you got that much data I would use SQL ;)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://325381]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-25 23:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found