http://www.perlmonks.org?node_id=1031281

RecursionBane has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, Monks!
I'm attempting to write a subroutine to perform regexp-based string substitution on the deepest elements of a complex, non-deterministic hash structure (i.e., it could be a HoAoHoA or a HoHoA or a HoA, or any other combination, to any depth). Note that we have no control over this structure. This operation will be performed infrequently enough that I've optimized for readability rather than runtime.
This is the (untested!) code I'm attempting; please let me know if there's an easier way to do this:
sub search_and_replace_in_hash { # Given a hash, perform a perl regexp operation on every value, re +gardless of how deep the hash is # Usage: search_and_replace_in_hash <\%hash> <expression> # Returns: New hash reference with replacement performed on deepes +t elements my %hash = %{shift()}; my $op = shift(); foreach my $key (sort keys %hash) { if ( reftype( $hash{$key} ) eq "HASH" ) { if (scalar( keys( %{$hash{$key}} ) ) > 0) { %hash = %{search_and_replace_in_hash( $hash{$key}, $op + )}; } next; } if ( reftype( $hash{$key} ) eq "ARRAY" ) { if (scalar( @{$hash{$key}} ) > 0) { foreach my $list_element (@{$hash{$key}}) { if (reftype($list_element) eq "HASH") { %hash = %{search_and_replace_in_hash( $hash{$k +ey}{$list_element}, $op )}; next; } } } else { next; } } # If you get here, you're a scalar %hash{$key} =~ $op; } return \%hash; }

I've looked at Hash::Util, but it didn't seem to handle N-deep structures.

Update: One thing I've just thought of is stringifying the entire structure, performing the search and replace, then converting the structure back. Would this be easier/more efficient? Is Data::Dumper the way to go for this?

Any ideas? Thanks in advance!
~RecursionBane

Replies are listed 'Best First'.
Re: Manipulate deepest values in complex hash structures
by LanX (Saint) on Apr 29, 2013 at 20:28 UTC
    • First of all operate on refs, don't do copies like my %hash = %{shift()}; just take $h_ref=shift

    • then allow the recursion to accept all ref types not only hashes

    • this seems pretty redundant:

      if (scalar( @{$hash{$key}} ) > 0) { foreach my $list_element (@{$hash{$key}}) {

    • I'd use if/elsif/else clauses if I were you for different types

    • you should handle unexpected types ( like objects) and warn

    • This recent post is an incomplete example only for HoHs, but pretty short.

    • why are you limiting yourself to a regex at the leaves? Better pass and execute a coderef to have a generic solution.

    But I'm pretty sure you are reinventing the wheel, better search the archives or CPAN. (Maybe Data::Diver or Data::Walker already do what you want.)

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    updates

    various ...

      enough theory, now practical code:

      use strict; use warnings; use Data::Dump qw/pp/; use Scalar::Util qw/reftype/; sub walk { my ($entry,$code) =@_; my $type = reftype($entry); $type //= "SCALAR"; if ($type eq "HASH") { walk($_,$code) for values %$entry; } elsif ($type eq "ARRAY") { walk($_,$code) for @$entry; } elsif ($type eq "SCALAR" ) { $code->($_[0]); # alias of entry } else { warn "unknown type $type"; } } my $test = { a => [2, 3, [4, { b => 42 }]], c => 5 }; pp $test; walk $test, sub { $_[0]+=100 }; pp $test;

      Output

      /usr/bin/perl -w /tmp/walker.pl { a => [2, 3, [4, { b => 42 }]], c => 5 } { a => [102, 103, [104, { b => 142 }]], c => 105 }

      Cheers Rolf

      ( addicted to the Perl Programming Language)

      UPDATE

      of course you could extend walk to call optional $coderefs on every node and specify them as named parameters:

       walk $ref , SCALAR => sub { print $_[0] } , ARRAY => sub { Dump $_[0] }, ...

      But I think the original code is so small thats its easier to copy and manipulate it in place for different needs.

        Great post, LanX. I have two questions:

        For the line:

        $type //= "SCALAR";

        am I correct in thinking that this is using the or operator as you would like += or .=? So that $type takes the value "SCALAR" if it doesn't already have a value? I looked in perlop, but I wasn't aware you could use logical operators like that...

        My second question is how

        $code->($_[0]);

        performs the operation on $_[0]. Currently I'm trying to become more comfortable with references (and they're very confusing), so I'd appreciate it if you could point me in the correct direction. I've already read http://perldoc.perl.org/perlreftut.html, but I'm trying to make it stick now.

        Thanks in advance!

      Thanks for the feedback, Rolf!
      I'll clean up my code and look into Data::Diver. It looks promising.

      ~RecursionBane
Re: Manipulate deepest values in complex hash structures
by hdb (Monsignor) on Apr 29, 2013 at 20:48 UTC

    One method is to use recursion in combination with dispatch tables:

    use strict; use warnings; my %dispatch = ( # scalar '' => sub { my ( $element, $op ) = @_; $op->( $element ); }, 'HASH' => sub { my ( $element, $op ) = @_; for my $key (keys %{$element}) { search_and_replace_in_hash( $element->{$key}, $op ); } }, 'ARRAY' => sub { my ( $element, $op ) = @_; search_and_replace_in_hash( $_, $op) for @{$element}; }, 'default' => sub { my ( $element, $op ) = @_; print STDERR ref($element).": unknown type of reference\ +n" }, ); # operates on references sub search_and_replace_in_hash { &{ $dispatch{ ref($_[0]) } // $dispatch{'default'} }(@_); } my $var = { "0x55555555" => { "0x55555555" => [ ["0xAAAAAAAA", "0x9"], + ], "0xAAAAAAAA" => [ ["0xAAAAAAAA", "0x8"], ], }, "0xAAAAAAAA" => { "0x55555555" => [ ["0xFFFFFFFF", "0x8"], ], "0xAAAAAAAA" => [ ["0x55555554", "0x3"], ], }, }; search_and_replace_in_hash( $var, sub { $_ =~ s/0x//; } ); search_and_replace_in_hash( $var, sub { print "$_\n"; } ); # no need f +or Data::Dumper

    For other types of references, you need to expand the dispatch table accordingly.

    Thanks to Rolf for his helpful comments above.

      Thank you, hdb! This is the first time I'm seeing dispatch tables; that's really neat!

      ~RecursionBane
Re: Manipulate deepest values in complex hash structures
by Laurent_R (Canon) on Apr 29, 2013 at 22:26 UTC

    Hi,

    even though the code below compiles cleanly and does the expected work for the example chosen, please consider it as a pseudocode example (rather than a real solution) of a recursive approach to the traversing of your data structure. I haven't tested any more complicated examples, I have considered that the structure could only include hash and array ref's and made a number of other simplifications.

    I agree with Rolf that it is better to use a coderef, so that's what I did (simply printing the values in my code below) rather than a hardcoded regexp, so as to make the walk_data_struct function usable for many other purposes.

    my %hash = (jan => [january, '01'], feb => [february, '02'], mar => [m +arch, '03']); walk_data_struct (\%hash, sub { my $leaf = shift; print "$leaf \n";}); sub walk_data_struct { my ($data_ref, $code_ref) = @_; if (ref $data_ref) { if (ref ($data_ref) eq "HASH") { walk_data_struct ($data_ref->{$_}, $code_ref) foreach keys + %$data_ref; } else { walk_data_struct ($_, $code_ref) foreach @$data_ref ; } } else { &{$code_ref} ($data_ref); } }
    This prints the leaves of my simple HoA:
    $ perl walk_HoA.pl february 02 january 01 march 03
    EDIT: Ooops, I had not seen that some actual solutions had been proposed meanwhile. The last post I had seen when I wrote the above was Rolf's post giving guidelines on how to solve the problem, but without any code. I had not seen any of the later posts with actual code. For some reason, I had missed the later posts. I still leave my solution, hoping that it might still help a little bit.
Re: Manipulate deepest values in complex hash structures
by jakeease (Friar) on Apr 30, 2013 at 09:32 UTC
Re: Manipulate deepest values in complex hash structures
by Anonymous Monk on Apr 29, 2013 at 22:47 UTC
      IMHO Data::Leaf::Walker can't be used in this particular task of changing leave values.

      Or can you see a possibility to pass an absolute key-path which selects only the deepest leaves?

      And the each-method will only return copies which can't be manipulated.

      OTOH generating an iterator is for sure an interesting alternative.

      Cheers Rolf

      ( addicted to the Perl Programming Language)

        No arguments that it may or may not be appropriate in this case.   I mention it mainly to point out (as other Monks have, since) that “canned” walkers also exist, which might (or might not) be suitable.

Re: Manipulate deepest values in complex hash structures
by gnosti (Chaplain) on Apr 30, 2013 at 16:39 UTC
    Another CPAN candidate that I've had good experience with is Data::Rmap. The API is modeled after perl's map function.
      > Another CPAN candidate that I've had good experience with is Data::Rmap.

      Yep! Looks good, at least from the documentation!

      IMHO the CPAN module for this special task till now! =)

      Cheers Rolf

      ( addicted to the Perl Programming Language)