Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

comparing two sets of data

by mimiandi (Novice)
on Nov 28, 2012 at 05:36 UTC ( #1005963=perlquestion: print w/ replies, xml ) Need Help??
mimiandi has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have two files containing data that i would like to compare. Both files contain key value(common key) that i would like to match and print out. What would be most efficient method of doing a comparison? eg, file1 contains some_data key some_data file2 contains different_data different_data key I opened both files and add one file to array and the other to hash then did comparison based on the key using nested for loops. Would there be better method of comparing? thank you in advance.
Sorry for the lack of clarity in my question. I have a list of all the employee's names in one file and the other fi +le has some names from some department. Input All employee file: John Smith Terry Smith Input Dept file: HR John Smith 6th floor R&D Terry Smith 5th floor I would like output to be if John Smith in Dept file exists in All emp +loyee file, print out his info. Hope that helps clear the question a bit. Thanks for your advice. Will definitely read the how not to ask questi +on :)

Comment on comparing two sets of data
Download Code
Re: comparing two sets of data
by CountZero (Bishop) on Nov 28, 2012 at 07:15 UTC
    Your specification is not entirely clear to me.

    Do you want to check:

    • Which keys are different in both files; or
    • Which values are different (for corresponding keys); or
    • A combination of both.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: comparing two sets of data
by bitingduck (Friar) on Nov 28, 2012 at 07:30 UTC

    To expand a little on what CountZero said, It's not entirely clear what you're trying to do, but since you already started working on this, you could make it easier to help you if you did a few things:

    • Post your complete code so far
    • Post a short set of example input
    • Show the output you want
    • Show the output you're getting

    If you wrap it all in <code> tags it will be easy to read and you'll probably get one or more solutions pretty quickly.

Re: comparing two sets of data
by marquezc329 (Scribe) on Nov 28, 2012 at 08:12 UTC

    Hello mimiandi,

    While I agree that your OP needs clarification and supplemental code, here is a simple method using grep to gather keys that differ.

    #!/usr/bin/perl use strict; use warnings; my %file1 = ( samp1 => 'stuff', samp2 => 'data', samp3 => 'info', samp4 => 'foo', samp5 => 'bar', blah => 'blah', brood => 'other' ); my %file2 = ( samp1 => 'stuff', samp2 => 'diff', samp3 => 'blah', samp4 => 'foo', junk => 'stuff', trash => 'other', samp5 => 'bar' ); my (@diff1, @diff2); push @diff1, grep !$file2{$_}, keys %file1; push @diff2, grep !$file1{$_}, keys %file2;

    Depending on the complexity of your data, you may want to have a look at Data::Compare. I believe Test::More also offers some tools for comparing data structures.

    Post some code or at least some sample input data and I'm sure you'll rope some better answers.

    Suggested Reading: How (Not) To Ask A Question

Re: comparing two sets of data
by ColonelPanic (Friar) on Nov 28, 2012 at 09:20 UTC

    It appears that the "key" is the entire line of file1. If so, then this is a pretty easy task.

    Here is an example:

    use Modern::Perl; my %keys; open my $file1, '<', 'filename1.txt' or die "Can't open file 1: $!"; open my $file2, '<', 'filename2.txt' or die "Can't open file 2: $!"; #read in the first file. while (<$file1>) { chomp; #remove newline. $keys{$_}++; #Add the key to your hash } #find keys in the second file. while (<$file2>) { foreach my $key (keys %keys) { say "$key matches" if /\b\Q$key\E\b/; } }

    Regex Notes: \b matches a word boundary. This is because you presumably wouldn't want "John Smith" to match "John Smithfield". \Q...\E is the quote literal modifier. If your key contains characters with a special meaning in regexes (such as the dot in "Mr. Smith") you want to match only the literal characters.

    Yes, you do end up needing a nested loop...that is intrinsic to the nature of your task.



    When's the last time you used duct tape on a duct? --Larry Wall

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1005963]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (14)
As of 2014-07-25 09:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (170 votes), past polls