Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

comparing nested data structures

by LanX (Canon)
on Nov 02, 2011 at 09:54 UTC ( #935333=perlquestion: print w/ replies, xml ) Need Help??
LanX has asked for the wisdom of the Perl Monks concerning the following question:

Honorable Monks

Is there a fast trick to diff deeply nested data structures, beside the obviouse recursive comparison?

I was thinking about dumping and comparing the text output ...

I'm mining large amounts of data into nested data structures and would like to join indentical sub-trees.

Cheers Rolf

Comment on comparing nested data structures
Re: comparing nested data structures
by GrandFather (Cardinal) on Nov 02, 2011 at 10:09 UTC

    So how is dumping the structures as text (which has to traverse the structures in any case) then parsing the text to recreate some internal representation to allow comparison of the structures going to be faster than a direct comparison?

    True laziness is hard work
      >...going to be faster than a direct comparison?

      Well dumping and diffing is done by XS-modules and the dumps can be stored in advance. (BTW: dumping is meant for persistence)

      So what are the alternatives you offer?

      Cheers Rolf

        If the dumping and diffing can be done using XS, then so can the recursive comparison.
Re: comparing nested data structures
by keszler (Priest) on Nov 02, 2011 at 10:15 UTC

    Let someone else write the recursive comparison? Hash::Diff

      Or if you want to compare arrays as well (and hashes of arrays, and arrays of hashes) then Data::Compare. It won't tell you what the differences are though.
      Thx but

      Plans for a future version include incorporate deep recursion protection. And support for ARRAY.

      Cheers Rolf

Re: comparing nested data structures
by JavaFan (Canon) on Nov 02, 2011 at 10:17 UTC
    If the nested data structures contain hashes, you'll have to sort the keys - which makes me think there no "fast trick" in the general case.

    I usually go for Test::More::is_deeply. But I don't know whether that's fast enough for you.

      Great!

      It might be implemented in perl, but it's a core module which will give me a quick start!

      Thx! :)

      Cheers Rolf

Re: comparing nested data structures
by CountZero (Bishop) on Nov 02, 2011 at 10:19 UTC
    I think Test::Deep might solve your problems, or at the very least its code could give you some ideas to implement it yourself.

    Deep::Hash::Utils also gives some "easier" access to deeply nested hashes and arrays, without the need to write yourself the recursive routines.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Very good, thanks a lot! :)

      Cheers Rolf

Re: comparing nested data structures
by BrowserUk (Pope) on Nov 02, 2011 at 11:33 UTC
    would like to join indentical sub-trees

    It kind of depends what you mean by "identical subtrees".

    If by identical, you mean that any substructure references below the top level are references to the same hash or array , then simply stringifying both hashes (or both arrays) and comparing the string for equality will tell you if they are identical or not. And it will be considerably faster than full tree traversal methods.

    AFAIK (and my ability to quickly verify), any hash that contains the same keys and values will stringify(*) to the same string, regardless of either the order in which the hash was constructed or whether it has at anytime contains other keys subsequently removed. So there should be no need to sort the keys.

    If you would consider two different arrays or hashes that contains the same data as "identical", then it would be necessary to recursively stringify contained references bottom up. It might well still be quicker than an element by element recursive comparison if all you need is a boolean yes/no rather than a blow-by-blow differences found.

    (*using suitable delimiters.)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://935333]
Approved by rovf
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2014-11-22 17:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (123 votes), past polls