Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Comparing hash data

by FubarPA (Monk)
on Oct 01, 2004 at 19:57 UTC ( [id://395733]=perlquestion: print w/replies, xml ) Need Help??

FubarPA has asked for the wisdom of the Perl Monks concerning the following question:

I have a unique (or so it seems) problem. I have two hashs which in theory contain the same data, but the keys are not the same name. As a bit of background, I'm writing a script to compare a CSV file back to a SQL database.

In the %sql hash, I have the exact fieldname as the key, but in the %csv hash, I have the fieldname less the beginning (example: EMP_NAME vs. NAME). The data should be the same, which is what I'm checking for. Is there a simple way of comparing these two hashes, or am I going to have to check each field individually? Keep in mind I do need to show differences, if any. Any thoughts?

Replies are listed 'Best First'.
Re: Comparing hash data
by cLive ;-) (Prior) on Oct 01, 2004 at 20:06 UTC
    Something like this?
    for (keys %csv) { $sql{"EMP_$_"} eq $cvs{$_} and next; print "$_ values are different:\n", " SQL: ".$sql{"EMP_$_"}."\n", " CSV: $sql{$_}\n\n"; }
    cLive ;-)
Re: Comparing hash data
by jZed (Prior) on Oct 01, 2004 at 20:08 UTC
Re: Comparing hash data
by diotalevi (Canon) on Oct 01, 2004 at 20:15 UTC
    The easiest thing to do is to create a mapping between the two data sources and then do your comparisons as normal.
Re: Comparing hash data
by periapt (Hermit) on Oct 01, 2004 at 20:16 UTC
    It seems that the first problem is to match the keys in the seperate hashes and then compare the contents so something like this might work (untested)
    foreach my $sqlkey (keys %sql){ foreach my $csvkey (keys %csv){ next unless $csvkey =~ /$sqlkey\z/; # match end of sqlkey # code to compare $csv{$csvkey} & $sql{$sqlkey} and do other t +hings } }
    Of course, this is an O(m*n) operation for m keys in %sql and n keys in %csv. If %sql and %csv are very large, this might be a prohibative operation. If this is true, you might want to map the keys of %csv to the keys of %sql (assuming they are one to one) and then do a straight comparison.
    my %sqltocsv = qw(EMP_NAME NAME EMP_JOB JOB ...); foreach $sqlkey (keys %sql){ if($sql{$sqlkey} eq $csv{$sqltocsv{$sqlkey}}){ # do things } }

    use strict; use warnings; use diagnostics;
Re: Comparing hash data
by Prior Nacre V (Hermit) on Oct 02, 2004 at 08:33 UTC

    Here's another method using map & grep functions. I believe this should capture all the data you need.

    It only performs a single pass through the hash keys. There are no nested loops.

    # hash_key_cmp use strict; use warnings; # Test data our %CVS = ( b => 1, c => 0, d => 1, e => 1, f => 1, g => 0, h => 1, i => 1, j => 1, k => 1, l => 1, ); our %SQL = ( x_a => 1, x_b => 1, x_d => 1, x_e => 1, x_f => 0, x_g => 1, x_h => 0, x_i => 1, x_j => 1, x_l => 1, x__k => 1, x_m => 0, ); # Compare hashes (my $prefix = (each %SQL)[0]) =~ s/^(.+_)?.*/$1/; my $prefix_len = length $prefix; my (@valid_in_both, @bad_in_both, @not_in_sql, @not_in_cvs); map { exists $SQL{join '', $prefix, $_} ? $CVS{$_} eq $SQL{join '', $prefix, $_} ? push(@valid_in_both, $_) : push(@bad_in_both, $_) : push(@not_in_sql, $_) } keys %CVS; @not_in_cvs = grep { ! exists($CVS{substr $_, $prefix_len}) } keys %SQ +L; # Output results print 'Not in SQL: ', "@not_in_sql", "\n"; print 'Not in CVS: ', "@not_in_cvs", "\n"; print 'Valid in both: ', "@valid_in_both", "\n"; print 'Bad in both: ', "@bad_in_both", "\n";

    Here's the output:

    [ ~/tmp ] $ perl hash_key_cmp Not in SQL: c k Not in CVS: x_a x__k x_m Valid in both: b d e i j l Bad in both: f g h [ ~/tmp ] $

    Update: Changed second map to a grep. Line used to read:

    map { exists($CVS{substr $_, $prefix_len}) || push(@not_in_cvs, $_) } +keys %SQL;

    Brain stuck in push() mode, I think :-)



Re: Comparing hash data
by Grygonos (Chaplain) on Oct 01, 2004 at 20:05 UTC

    Why can't they have the same keys? What is preventing that? Since we don't know what constitutes the "beginning" of the field name, we can't really help you with a solution. post your field names and then maybe we could help.. however I would suspect there is a way to make them have the same keys.. which would solve the problem for you.

Re: Comparing hash data
by The_Rabbit (Acolyte) on Oct 01, 2004 at 20:05 UTC
    If you just want to compare whether the values of each hash are the same you could do :
    foreach my $v1 (values %sql) { foreach my $v2 (values %csv) { if ($v1 ne $v2) { # do something } } }
    There is probably a more efficient way, but I think this would do the trick.

    EDIT: Thanks to cLive for pointing out that I'm totally on crack. Look at his post below for the good way to do this.

      This is wrong. Very wrong. Testing each value against every other value in the other hash? So, for hashes with n keys, you're going to have at least (n-1)^2 cases where $v1 ne $v2 - and that's if they match. NAME may not match EMP_NAME, but it definitely won't match EMP_PHONE, EMP_AGE etc, so why test it?

      Only n matches need to be made - assuming you can map the keys across the hashes correctly (my guess is below on this one).

      I think you misunderstood the question.

      cLive ;-)

        You are totally right I am not sure what I was thinking when I wrote that. When I saw your post I smashed my head against the keyboard. I've had a long day...

        correct.. but without making assumptions about the "beginning" of a key we can't truly know how to properly map key to key. I agree it's inefficient.. however it was the only way I saw (well agreed with actually) to do it without making assumptions.

      That's true you could do values and sort on values and report any mismatches., and their field names. of course you may run into a cascading problem.. if field 2 doesn't match..then depending on what's causing the problem.. the rest of them may throw errors due to improper ordering.. or they may not. This is a decent way without fixing your data struct.. however that's still my reccomendation

Re: Comparing hash data
by TedPride (Priest) on Oct 02, 2004 at 05:53 UTC
    Assuming $sql and $csv are both pointers to the hashes:
    my %scsv; foreach (keys %$csv) { $scsv{substr($_, 4)} = %$csv->{$_}; } &hash_match($sql, \%scsv, 'sql', 'csv'); sub hash_match { my ($p1, $p2, $n1, $n2) = @_; my ($a, $b); my @k1 = sort keys %$p1; my @k2 = sort keys %$p2; while ($a <= $#k1 && $b <= $#k2) { if (@k1[$a] eq @k2[$b]) { print "\$$n1\{'" . @k1[$a] . "'\} = " . $p1->{@k1[$a]} . " + ne " . "\$$n2\{'" . @k2[$b] . "'\} = " . $p2->{@k2[$b]} . " +\n" if $p1->{@k1[$a]} ne $p2->{@k2[$b]}; $a++; $b++; } elsif (@k1[$a] lt @k2[$b]) { print "Unique \$$n1\{'" . @k1[$a] . "'\} = " . $p1->{@k1[$ +a++]} . "\n"; } else { print "Unique \$$n2\{'" . @k2[$b] . "'\} = " . $p2->{@k2[$ +b++]} . "\n"; } } print "Unique \$$n1\{'" . @k1[$a] . "'\} = " . $p1->{@k1[$a++]} . +"\n" while ($a <= $#k1); print "Unique \$$n2\{'" . @k2[$b] . "'\} = " . $p2->{@k2[$b++]} . +"\n" while ($b <= $#k2); }
    You might want to change the output format somewhat, and this isn't necessarily the prettiest chunk of code possible, but it works in O(n+m) rather than O(n*m) and does what you need.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://395733]
Approved by Arunbear
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-06-17 07:59 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.