sciurius has asked for the wisdom of the Perl Monks concerning the following question:

Assume you have a (big) perl data structure, a hash, of which all keys have values that can be scalars, arrays or hashes. Arrays and hashes can contain scalars, arrays or hashes, and so on.

For example:

$orig = { a => { b => [ 'c', 'd' ], e => [ [ 'f' ] ] } };

Assume there is an operation 'augment' that takes this data structure and another (smaller) structure. The original data structure will be modified so that all the data from the smaller structure is now in the original structure as well.

For example, augmenting the strucure above with

{ a => { e => [ [ 'g' ] ] } }

will yield

{ a => { b => [ 'c', 'd' ], e => [ [ 'g' ] ] } }

What I'm looking for is the reversed operation, let's call it 'reduce', that takes the original and augmented structures as arguments, and returns the (minimal) structure that is needed to augment the original.

I'm sure that there are modules that already implement this but I haven't been able to identify them...

A cut/paste ready test program:

use strict; use Test::More tests => 2; use Storable qw(dclone); my $orig = { a => { b => [ 'c', 'd' ], e => [ [ 'f' ] ] } }; my $new = dclone($orig); my $delta = { a => { e => [ [ 'g' ] ] } }; augment( $new, $delta ); my $augmented = { a => { b => [ 'c', 'd' ], e => [ [ 'g' ] ] } }; is_deeply( $new, $augmented, "augment" ); reduce( $new, $orig ); is_deeply( $new, $delta, "reduce" );

Replies are listed 'Best First'.
Re: Augmenting and reducing data structures
by hippo (Chancellor) on Apr 22, 2021 at 13:04 UTC

    If you are happy to amend your delta to fit its structure then Struct::Diff offers a solution.

    use strict; use Test::More tests => 2; use Storable qw(dclone); use Struct::Diff qw/diff patch/; my $orig = { a => { b => [ 'c', 'd' ], e => [ [ 'f' ] ] } }; my $new = dclone($orig); my $delta = { D => { a => { D => { e => { D => [ { D => [ { O => 'f', N => 'g' } ] } ] } } } } }; patch ( $new, $delta ); my $augmented = { a => { b => [ 'c', 'd' ], e => [ [ 'g' ] ] } }; is_deeply( $new, $augmented, "augment" ); my $diff = diff( $orig, $new, noU => 1 ); is_deeply( $diff, $delta, "reduce" );

    🦛

Re: Augmenting and reducing data structures
by tybalt89 (Prior) on Apr 22, 2021 at 23:55 UTC

    Just because it looked like it might be fun...

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11131585 use warnings; my $orig = { a => { b => ["c", "d"], e => [["f"]] } }; use Data::Dump 'dd'; dd 'orig', $orig; my $augment = { a => { b => [ 'c', 'd' ], e => [ [ 'g' ] ] } }; use Data::Dump 'dd'; dd 'augment', $augment; my $delta = reduce( $orig, $augment ); use Data::Dump 'dd'; dd 'delta', $delta; sub reduce { my %augment = map { $_ => '' } buildstatements( pop ); delete @augment{ buildstatements( shift ) }; eval join '', %augment for my $delta; # HAHAHAHAHAHA return $delta; } my @buildcode; sub buildstatements { @buildcode = (); creators( '$_->', shift() ); return @buildcode; } sub creators { my ($have, $ref) = @_; if( 'HASH' eq ref $ref ) { creators( $have . "{$_}", $ref->{$_} ) for keys %$ref; } elsif( 'ARRAY' eq ref $ref ) { creators( $have . "[$_]", $ref->[$_] ) for 0 .. $#$ref; } else { push @buildcode, qq($have = "$ref";); } }

    Outputs:

    ("orig", { a => { b => ["c", "d"], e => [["f"]] } }) ("augment", { a => { b => ["c", "d"], e => [["g"]] } }) ("delta", { a => { e => [["g"]] } })

    Hey, it passes all the provided test cases :)

Re: Augmenting and reducing data structures
by LanX (Cardinal) on Apr 22, 2021 at 12:36 UTC
    If you want to implement this on your own you need to

    • walk the "delta" structure bottom-up and
    • delete the corresponding entries in the "augmented" structure by rules
    • Rule1: delete simple scalars
    • Rule2: delete only empty containers (hashes&arrays)

    For Rule2 it's crucial that you walked bottom up. If there are childrens left after deleting all delta-childrens you can't be allowed to delete a container.

    The walking can be done with a recursive function which does the deleting after recursing just before returning.

    Hence all childrens at the "bottom" have already been processed.

    HTH! :)

    edit

    FWIW: for augmenting you can use the same walker for the delta, just top-down.

    I.e. you "augment" new elements just before recursing.

    (actually thanks to auto-vivification this can be done more easily by only adding the leaves of the tree, but not much wrong with a clean symmetric algorithm? :)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      The devil is -like so often - in the details for edge cases:

      • What if an element is a scalar on one side and a container on the other?
      • What if an element is in the delta but not in the augmented data?

      Ignore? Warning? Exception and/or Dieing?

      The combination of all error cases will make it hard to find a ready to use module whose customization is easier than a new implementation.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        My question originates from a large software project that has a complex configuration structure.

        Something similar to:

        # cfg1 $config = { "common" => { debug => 1, logfile => "/var/log/project.log", path => "/opt/projects/bin", }, "project_1" => { name => "Project 1", tasks => [ { name => "Prepare", program => "p1prep", arguments => [ "--init", '--default' ], logfile => "/var/log/p1prep.log", }, { name => "Work", program => "p1", arguments => [ "--collect" ], logfile => "/var/log/p1.log", env => { DISPLAY => undef, }, }, { name => "Wrapup", program => "signal", arguments => [ "p1", "finished" ], }, ] }, "project_2" => { "and" => [ "so", "on", "..." ], }, };

        A particular customer could supply her own config, e.g.

        # cfg2 $config = { "common" => { debug => 0, logfile => "/var/log/project.log", path => "/opt/projects/bin", }, "project_1" => { name => "Project 1", tasks => [ { name => "Prepare", program => "p1prep", arguments => [ "--init", '--default' ], logfile => "/var/log/p1prep.log", }, { name => "Work", program => "p1", arguments => [ "--collect", "--brief" ], logfile => "/var/log/p1.log", env => { DISPLAY => undef, }, }, { name => "Wrapup", program => "signal", arguments => [ "p1", "finished" ], }, ] }, "project_2" => { "and" => [ "so", "on", "..." ], }, };

        But instead of supplying a full config, it suffices to supply a smaller set of modifications:

        # cfgaug $config = { "common" => { debug => 0 }, "project_1" => { tasks => [ { name => "Prepare" }, { name => "Work", arguments => [ "--collect", "--brief" ], }, { name => "Wrapup" }, ] }, };

        Applying #cfgaug to #cfg1 will result in #cfg2.

        Since several of these augmentations are possible, the need arises to produce a single 'delta' that augments the base configuration #cfg1 to the actual state.

Re: Augmenting and reducing data structures
by bliako (Prior) on Apr 22, 2021 at 13:00 UTC

    And if you want to take advantage of existing algorithms you could try convert your hash to N-ary Tree or cyclic Graph if you have loops and see if there are already existing algorithms to search for specific chains/branches. It will be challenging how you map an array and a hash to a branch of a tree, given they are ordered and un-ordered data. Perhaps sorting the hashes first. (just thinking out loud)

    Edit: I took "augment" to mean "add/incorporate" (which it does) but it seems you want to modify, unless you want both (but this is not obvious from your test case). So the above is probably irrelevant.

    bw, bliako

      For some reason this comment reminded me of zippers in functional languages. I've forgotten the minimal amount I knew of them but it vaguely tickled some recollection.

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

Re: Augmenting and reducing data structures
by perlfan (Vicar) on Apr 22, 2021 at 17:13 UTC
    This is certainly an interesting question. Your data structure is similar to a finite automata, basically a "state" (or node) is a key in the graph; the nesting represents the edge, and the values of the keys represents the label of the edge (transition predicate). A DFA is just a graph, although many applications represent them as adjacency matrices.

    But if you admit some constraints to what it means to "minimize" or "reduce" your data structure, you may rightly be able to apply some technique of DFA minimization; and in fact leverage all of the other neat concepts attached to FA (again they are just graphs with labeled edges and some notion of a start state and 0 or more end states).

    (update) and yes, this is a "compression" type algorthm for using the structure as a truth table. But, an interesting and maybe counter intuitive effect of this compress or minimization, is that the more "dense" a DFA is, the more unweildy the results of reducing it to an equivalent "regular expression" becomes (not the same as the Perl kind, but language description nonetheless). There are algorithms to do this, but they'e far above my paygrade :-)
A reply falls below the community's threshold of quality. You may see it by logging in.