Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Checksum on Multidimentional Array - how does it work

by udvk009 (Novice)
on Mar 26, 2015 at 11:31 UTC ( #1121378=perlquestion: print w/replies, xml ) Need Help??

udvk009 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, Is it possible to compare the contents of array using check sum algorithm ? objective - Lets say i have a 2 multidimensional array that may have few hundred thousand rows(lets sat 500,000 ) and my objective is to compare this 2 arrays using check-sum to find if they are different. i.e. lets say array-1 may have row-x which may be missing in array-2. Conceptually i want to find out why my check-sum function returns same result for 2 different arrays. Please advise how to go about it. I have tried a sample code with very small set of array , please advise why the check-sum shows same result for both array ?

#!C:\Perl5.16\bin\perl.exe use Data::Dumper; use Digest::MD5 qw(md5 md5_hex md5_base64); my @array1 = ( [1,'John','ABXC12132328'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'] ); my @array2 = ( [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'] ); #print Dumper(\@array1); my $ref_array1 = @array1; my $ref_array2 = @array2; my $str = md5($ref_array1); my $str2 = md5($ref_array2); print "md-check-sum for array1 :: ".unpack('L', $str)."\n"; print "md-check-sum for array2 :: ".unpack('L', $str2)."\n";

output shows as below

md-check-sum for array1 :: 2134629092 md-check-sum for array2 :: 2134629092

Replies are listed 'Best First'.
Re: Checksum on Multidimentional Array - how does it work
by BrowserUk (Pope) on Mar 26, 2015 at 11:55 UTC

    Because you're not checksumming the arrays. You are checksumming the lengths of the arrays (which are the same):

    my $ref_array1 = @array1; ## Assigns the length of @array1 to $ref_ar +ray!!! my $ref_array2 = @array2; ## Ditto!

    To checksum the contents of the arrays, one way would be to serialise them (convert to a string representation):

    #!C:\Perl5.16\bin\perl.exe use Data::Dumper; use Digest::MD5 qw(md5 md5_hex md5_base64); my @array1 = ( [1,'John','ABXC12132328'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'] ); my @array2 = ( [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'] ); #print Dumper(\@array1); my $ref_array1 = Dumper( \@array1 ); my $ref_array2 = Dumper( \@array2 ); my $str = md5_hex($ref_array1); my $str2 = md5_hex($ref_array2); print "md-check-sum for array1 :: " . $str . "\n"; print "md-check-sum for array2 :: " . $str2 . "\n"; __END__ C:\test>1121378 md-check-sum for array1 :: b636a47153af27317478e3bca3632602 md-check-sum for array2 :: a4882627a89775602ab2e33762a70e81

    Note also that I've nixed your unpack 'L', stuff which throws away 3/4 of the information in the 128-bit checksum by converting only the first 32-bits to a number.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re: Checksum on Multidimentional Array - how does it work
by monkey_boy (Priest) on Mar 26, 2015 at 12:04 UTC
    my $ref_array1 = @array1; my $ref_array2 = @array2; my $str = md5($ref_array1); my $str2 = md5($ref_array2);
    You are doing a digest both times on the count of the items in the arrays, i.e:
    print $ref_array1; print $ref_array2;
    outputs:
    5
    5


    Even if you actually took a reference to to these arrays, (my $ref_array1 = \@array1;) your solution would never work, as you would be digesting just a memory code/id for the two named arrays.
    The solution here is I suspect, to serialize the arrays & digest the stringified values, e.g:
    #!/usr/bin/env perl use Modern::Perl; use Data::Dumper; use Digest::MD5 qw(md5 md5_hex md5_base64); my @array1 = ( [1,'John','ABXC12132328'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'] ); my @array2 = ( [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'], [0,'John','ABXC12132322'] ); my @array3 = @array2; #print Dumper(\@array1); my $md5_1 = md5_hex(Dumper(\@array1)); my $md5_2 = md5_hex(Dumper(\@array2)); my $md5_3 = md5_hex(Dumper(\@array3)); say 1,' ',$md5_1; say 2,' ',$md5_2; say 3,' ',$md5_3;


    This is not a Signature...

      Thanks folks for the quick reply and explaining the implementation ! Appreciate the help ... cheers!!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1121378]
Approved by jellisii2
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2021-07-28 18:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?