Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: comparing two files for duplicate entries

by Melly (Chaplain)
on Sep 27, 2006 at 16:34 UTC ( [id://575190]=note: print w/replies, xml ) Need Help??


in reply to comparing two files for duplicate entries

I don't know how well it would scale, but I'd build a hash from the first file ($objects{'object1'}=23.12;), then scan the second file. If the object in the second file has a defined hash, then print out the hashkey and both values..

Untested code... and I'm assuming a space delim. as per your examples..

open(FILE, "file1"); while(<FILE>){ if(/(\S+)\s+(\S+)/){ $hash{$1} = $2; } } close FILE; open(FILE, "file2"); while(<FILE>){ if(/(\S+)\s+(\S+)/){ print "$1: $hash{$1} $2\n" if defined $hash{$1}; } }
Tom Melly, tom@tomandlu.co.uk

Replies are listed 'Best First'.
Re^2: comparing two files for duplicate entries
by Fletch (Bishop) on Sep 27, 2006 at 16:50 UTC

    Yup, it's really just that simple (well, maybe exists rather than defined; but that's a minor nit). If your files are really, really big you probably want to use something like Berkeley_DB or one of the other DBM modules rather than reading everything into memory, but that's just an implementation detail; the basic algorithm remains the same.

Re^2: comparing two files for duplicate entries
by mk. (Friar) on Sep 27, 2006 at 16:54 UTC
    or, just slightly different:
    open(FILE1, "file1"); open(FILE2, "file2"); while(<FILE1>){ /(\S*)\s+(\S*)/; $hash{$1}=$2; } while(<FILE2>){ /(\S*)\s+(\S*)/; print "$1 $hash{$1} $2\n" if $hash{$1} }


    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    "one who asks a question is a fool for five minutes; one who does not ask a question remains a fool forever."

    mk at perl dot org dot br

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://575190]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-19 04:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found