Re: Removing duplicates in large files


Come for the quick hacks, stay for the epiphanies.
	PerlMonks

Re: Removing duplicates in large files

by lestrrat (Deacon)

on Jan 30, 2004 at 20:29 UTC ( [id://325381]=note: print w/replies, xml )

Need Help??

in reply to Removing duplicates in large files

I suppose that if you must use Perl for this, you could use DB_File (or other *DB_File modules), and just keep chugging the email address to a file. Since this being a hash, it would weed out duplicates.

some code fragments...

  use DB_File;
  my %hash = tie(...., 'DB_File'....);
  my $fh   = something_to_open_the_file(...);

  while (my $addr = <$fh>) {
    chomp($addr);
    $hash{lc($addr)} = 1;
  }
[download]

Then you can open that db that DB_File created, and dump it to a file, whatever.

However, if you got that much data I would use SQL ;)

Comment on Re: Removing duplicates in large files Download Code

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://325381]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others pondering the Monastery: (4)

As of 2024-04-25 23:28 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found