Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

comparing two sets of data

by mimiandi (Novice)
on Nov 28, 2012 at 05:36 UTC ( #1005963=perlquestion: print w/replies, xml ) Need Help??
mimiandi has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have two files containing data that i would like to compare. Both files contain key value(common key) that i would like to match and print out. What would be most efficient method of doing a comparison? eg, file1 contains some_data key some_data file2 contains different_data different_data key I opened both files and add one file to array and the other to hash then did comparison based on the key using nested for loops. Would there be better method of comparing? thank you in advance.
Sorry for the lack of clarity in my question. I have a list of all the employee's names in one file and the other fi +le has some names from some department. Input All employee file: John Smith Terry Smith Input Dept file: HR John Smith 6th floor R&D Terry Smith 5th floor I would like output to be if John Smith in Dept file exists in All emp +loyee file, print out his info. Hope that helps clear the question a bit. Thanks for your advice. Will definitely read the how not to ask questi +on :)

Replies are listed 'Best First'.
Re: comparing two sets of data
by CountZero (Bishop) on Nov 28, 2012 at 07:15 UTC
    Your specification is not entirely clear to me.

    Do you want to check:

    • Which keys are different in both files; or
    • Which values are different (for corresponding keys); or
    • A combination of both.


    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: comparing two sets of data
by bitingduck (Chaplain) on Nov 28, 2012 at 07:30 UTC

    To expand a little on what CountZero said, It's not entirely clear what you're trying to do, but since you already started working on this, you could make it easier to help you if you did a few things:

    • Post your complete code so far
    • Post a short set of example input
    • Show the output you want
    • Show the output you're getting

    If you wrap it all in <code> tags it will be easy to read and you'll probably get one or more solutions pretty quickly.

Re: comparing two sets of data
by ColonelPanic (Friar) on Nov 28, 2012 at 09:20 UTC

    It appears that the "key" is the entire line of file1. If so, then this is a pretty easy task.

    Here is an example:

    use Modern::Perl; my %keys; open my $file1, '<', 'filename1.txt' or die "Can't open file 1: $!"; open my $file2, '<', 'filename2.txt' or die "Can't open file 2: $!"; #read in the first file. while (<$file1>) { chomp; #remove newline. $keys{$_}++; #Add the key to your hash } #find keys in the second file. while (<$file2>) { foreach my $key (keys %keys) { say "$key matches" if /\b\Q$key\E\b/; } }

    Regex Notes: \b matches a word boundary. This is because you presumably wouldn't want "John Smith" to match "John Smithfield". \Q...\E is the quote literal modifier. If your key contains characters with a special meaning in regexes (such as the dot in "Mr. Smith") you want to match only the literal characters.

    Yes, you do end up needing a nested loop...that is intrinsic to the nature of your task.

    When's the last time you used duct tape on a duct? --Larry Wall
Re: comparing two sets of data
by marquezc329 (Scribe) on Nov 28, 2012 at 08:12 UTC

    Hello mimiandi,

    While I agree that your OP needs clarification and supplemental code, here is a simple method using grep to gather keys that differ.

    #!/usr/bin/perl use strict; use warnings; my %file1 = ( samp1 => 'stuff', samp2 => 'data', samp3 => 'info', samp4 => 'foo', samp5 => 'bar', blah => 'blah', brood => 'other' ); my %file2 = ( samp1 => 'stuff', samp2 => 'diff', samp3 => 'blah', samp4 => 'foo', junk => 'stuff', trash => 'other', samp5 => 'bar' ); my (@diff1, @diff2); push @diff1, grep !$file2{$_}, keys %file1; push @diff2, grep !$file1{$_}, keys %file2;

    Depending on the complexity of your data, you may want to have a look at Data::Compare. I believe Test::More also offers some tools for comparing data structures.

    Post some code or at least some sample input data and I'm sure you'll rope some better answers.

    Suggested Reading: How (Not) To Ask A Question

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1005963]
Approved by davido
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (9)
As of 2018-06-22 02:34 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (121 votes). Check out past polls.