Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: A fix for merge keys in YAML::XS, YAML::Syck

by bv (Friar)
on Dec 21, 2009 at 16:34 UTC ( #813726=note: print w/replies, xml ) Need Help??


in reply to A fix for merge keys in YAML::XS, YAML::Syck

Well, I spent some more time thinking about the problem, and I came up with a slightly better algorithm. Instead of a recursive, depth-first approach, I used an iterative, breadth-first algorithm that I arrived at while trying to optimize my previous attempt into a tail-recursive scheme. Code and benchmarks follow.

#!/usr/bin/perl use strict; use warnings; use YAML::XS; use Benchmark qw(cmpthese); my $data; {local $/;$data = <DATA>}; cmpthese( -5, { recursive => sub {my $d = Load $data; mergekeys_recursive( $d +); }, tail_call => sub {my $d = Load $data; mergekeys_tail( $d ); }, loop => sub {my $d = Load $data; mergekeys_loop( $d ); }, } ); sub mergekeys_recursive { my ($ref) = @_; my $type = ref $ref; if ($type eq 'HASH') { my $tmphref = $ref->{'<<'}; if ($tmphref) { die "Merge key does not support merging non-hashmaps" unless (ref $tmphref eq 'HASH'); my %tmphash = %$tmphref; delete $ref->{'<<'}; %$ref = (%tmphash, %$ref); } mergekeys_recursive($_) for (values %$ref); } elsif ($type eq 'ARRAY') { mergekeys_recursive($_) for (@$ref); } return $ref; } sub mergekeys_tail { my ($ref) = (@_); _mergekeys($ref); return $ref; } sub _mergekeys { my $ref = shift or return; my $type = ref $ref; if ($type eq 'HASH') { my $tmphref = $ref->{'<<'}; if ($tmphref) { die "Merge key does not support merging non-hashmaps" unless (ref $tmphref eq 'HASH'); my %tmphash = %$tmphref; delete $ref->{'<<'}; %$ref = (%tmphash, %$ref); } push @_, grep {ref eq 'HASH' or ref eq 'ARRAY'} values %$ref; } elsif ($type eq 'ARRAY') { push @_, grep {ref eq 'HASH' or ref eq 'ARRAY'} @$ref; } goto &_mergekeys; } sub mergekeys_loop { my ($orig) = @_; while (my $ref = shift) { my $type = ref $ref; if ($type eq 'HASH') { my $tmphref = $ref->{'<<'}; if ($tmphref) { die "Merge key does not support merging non-hashmaps" unless (ref $tmphref eq 'HASH'); my %tmphash = %$tmphref; delete $ref->{'<<'}; %$ref = (%tmphash, %$ref); } push @_, grep {ref eq 'HASH' or ref eq 'ARRAY'} values %$r +ef; } elsif ($type eq 'ARRAY') { push @_, grep {ref eq 'HASH' or ref eq 'ARRAY'} @$ref; } } return $orig; } __DATA__

Data I use for benchmark:

--- key1: &id05 - &id02 name: Curly lastname: Howard hair: no occupation: stooge - <<: *id02 name: Larry lastname: Fine hair: curly - <<: *id02 name: Moe hair: bowl - <<: *id02 name: Shemp hair: yes on_again: off_again key2: subkey_a: - foo - phoo - ghoo - - [[[[[[[[[[[[[[asdf]]]]]]],1],3]],{whatever: works}]]]] - sounds: - &id03 voice: hollow says: plugh colors: - red - green - blue - <<: *id03 voice: wind says: do you hear what i hear - <<: *id03 voice: stooge says: - nyuk - nyuk - nyuk characters: *id05 - zxcv subkey_b: - &id01 name: bar type: variable weather: sunny - <<: *id01 name: baz hometown: Perth ...

And the results of the benchmark:

Rate recursive tail_call loop recursive 951/s -- -6% -14% tail_call 1016/s 7% -- -9% loop 1111/s 17% 9% --

I was surprised that the loop was that much faster than the tail-recursion, since it amounts to about the same thing (go to top of loop, check condition, execute or return). It is important to note, though, that I was only able to replace the goto with a while loop because my function was intended to be run for its side-effects on the data structure. If I needed to accumulate a return value, the goto would have been the better solution.


print map{(split//,'hark, suPerJacent other l')[$_]}(11,7,6,16,5,1,15,18..23,8..10,24,17,0,12,13,3,14,2,4);

Replies are listed 'Best First'.
Re^2: A fix for merge keys in YAML::XS, YAML::Syck
by Anonymous Monk on Oct 10, 2015 at 16:48 UTC
    The original version may fail to merge keys if the merged hash is a sibling of the merging hash and itself merges a sibling. Success or failure then depends on the order in which hash values are returned by keys()!
    key1: &key1 a: a b: b c: c key2: &key2 <<: *key1 d: d e: e f: f key3: <<: *key2 g: g h: h i: i ...
    If key2 is pushed onto @_ before key3 all is well, but if not there will still be a merge key pointing at key1 after key2 has been merged into key3. Replacing the check for the existence of a mergekey with a while loop fixes that, because then a new merge will be performed and the outer while loop traversing @_ will not move on while there is still/again a merge key.
    sub mergekeys_loop { my ( $orig ) = @_; while ( my $ref = shift ) { my $type = ref $ref; if ( $type eq 'HASH' ) { # my $tmphref = $ref->{'<<'}; # if ( $tmphref ) { while ( my $tmphref = $ref->{'<<'} ) { die "Merge key does not support merging non-hashmaps" unless ( ref $tmphref eq 'HASH' ); my %tmphash = %$tmphref; delete $ref->{'<<'}; %$ref = ( %tmphash, %$ref ); } push @_, grep { ref eq 'HASH' or ref eq 'ARRAY' } values % +$ref; } elsif ( $type eq 'ARRAY' ) { push @_, grep { ref eq 'HASH' or ref eq 'ARRAY' } @$ref; } } return $orig; }
      I wrote:

      The original version may fail to merge keys if the merged hash is a sibling of the merging hash and itself merges a sibling. Success or failure then depends on the order in which hash values are returned by keys()!

      That should have read:

      The original version of mergekeys_loop() may fail to merge keys if the merged hash is a sibling of the merging hash and itself merges a sibling. Success or failure then depends on the order in which hash values are returned by values()!

      Sorry!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://813726]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2019-12-06 06:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (154 votes). Check out past polls.

    Notices?