joe has asked for the wisdom of the Perl Monks concerning the following question:

This one's got me a bit stumped: I'd like to do something like the following:
while(<>) { ($a, $b, $c, $d) =~ /(...)(...)(...)(...)/; $hash{$a}{$b}{$c}{$d}++; }
basically, splitting the line into 3 character chunks and placing them into a multi dimensional hash... however the line could be of arbitrary length, and I'd want to capture all the 3 character sequences (thus the hash could look like: $hash{$a}{$b}{$c}{$d}{$e}{$f}++ or just $hash{$a}++ ) Conversely, how would I read such a datastructure? i.e. if I know the depth of the hash, I can write:
foreach $key (keys %hash) { foreach $key1 (keys %{$hash{$key}}) { foreach $key2 (keys %{$hash{$key}{$key1}}) { # do something } } }
But I do not know the depth of the hash.

Replies are listed 'Best First'.
Re: varying length multi-dimensional hashes
by chromatic (Archbishop) on Oct 04, 2000 at 22:20 UTC
    So what you need is an arbitrarily nested hash searcher? Yikes, those can get tricky. See ref, but I'd start out with (pseudo) code like the following:
    sub find_value { my $hash_ref = shift; my $level; foreach my $key (keys %$hash_ref) { $level .= "$key\n"; my $value = $hash_ref->{$key}; if (ref($value) eq 'HASH') { $level .= find_value($value); } else { $level .= "\t$value\n"; } } return $level; }
    That's untested, and it doesn't do anything that Data::Dumper doesn't do better... but it's a place to start.
      Thanks Chromatic! That looks like the exact answer for traversing the hash. Now all I need is to do is to figure out how to create the hash. My partner suggests using eval. What do you think?
(tye)Re: varying length multi-dimensional hashes
by tye (Sage) on Oct 04, 2000 at 23:29 UTC

    Okay, here is one way to populate the hash:

    #!/usr/bin/perl -w use strict; my %hash; while( <> ) { chomp $_; my @trib= grep { "" ne $_ } split /(...)/, $_; my $ref= \\%hash; while( @trib ) { $ref= \$$ref->{shift @trib}; } $$ref++; }

    Note how I avoided having to save shift @trib to index into the next subhash by keeping a reference to the slot for the hash value instead of a reference to the hash.

    If that code makes sense to you right away, then you are either sick, overlooking something, or do way too much work with Perl references. (:

    P.S. I was seriously disappointed that

    my @trib= split /(?<=\G...)/, $_;
    doesn't work because split doesn't set pos($_) the way /(?<=\G...)/g would.

            - tye (but my friends call me "Tye")
Re: varying length multi-dimensional hashes
by Fastolfe (Vicar) on Oct 04, 2000 at 22:53 UTC
    Since you're dealing with references, you can traverse your hash with a pointer-type of reference.
    my %hash; while (<STDIN>) { chomp; my $pointer = \%hash; # start from root while (s/^(...)//) { # keep eating 1st 3 if ($_) { # if there's more to go, # we go deeper $pointer->{$1} = {} unless ref($pointer->{$1}); $pointer = $pointer->{$1}; } else { $pointer->{$1}++; # otherwise, set it to 1 } } }
    Note that the following data will mess things up slightly:
    111222333444555 111222333 $HASH{111}->{222}->{333} == 1 # lost 444 and 555 111222333 111222333444555 $HASH{111}->{222}->{333} != 1 # now a hash reference
    These are unavoidable though with the criteria you've given. This code also assumes that your string will always be of a length that is divisible by 3. If you have two trailing characters at the end, the substitution will fail, but the test for $_ will have succeeded in the previous loop, giving you a dangling empty hash reference instead of 1.
Re: varying length multi-dimensional hashes
by c-era (Curate) on Oct 04, 2000 at 23:05 UTC
    I don't think this is the best way to do it, but it sure was fun to write:
    my $eval; my %hash; # Find all three letter occurences, and combine them into one eval sta +tment. while (<>){ $eval = '$hash '; while (/(...)/g){ $eval .= "{'$1'}" } $eval .= "++;" eval ($eval); # You should probably do some error checking at this + point. } for (keys %hash){ hashValue ($hash{$_}); } # You got to love recursive functions ;) sub hashValue { my $hash = @_[0]; if (defined %$hash){ for (keys %$hash){ hashValue ($$hash{$_}); } } else { print $hash; # Or do whatever you like with the value } }
(jcwren) RE: varying length multi-dimensional hashes
by jcwren (Prior) on Oct 04, 2000 at 23:16 UTC
    I'll leave the reading of the hash as an exercise to the user, but here's a fun way to populate the hash:

    Update: Apparently, here's a really slow, dangerous, stupid, fraught with peril, performancing dehancing way to do it. Oh well, some code just sucks, don't it?
    #!/usr/local/bin/perl -w use strict; use Data::Dumper; { my %hash = (); while (<DATA>) { my $s; $s .= "{'$_'}" foreach (/(...)/g); eval ('$hash ' . $s . '++'); die $@ if $@; } print Dumper ([\%hash]); } __DATA__ aaabbbcccddd aab aacbbb aaabbacca aaabbaccb ababbbddd

    $VAR1 = [ { 'aab' => 1, 'aba' => { 'bbb' => { 'ddd' => 1 } }, 'aac' => { 'bbb' => 1 }, 'aaa' => { 'bba' => { 'cca' => 1, 'ccb' => 1 }, 'bbb' => { 'ccc' => { 'ddd' => 1 } } } } ];

    e-mail jcwren
      I voted this one down because
      • it's an unnecessary use of runtime-string-compilation eval
      • it breaks on general data, so it's not a good general pattern, and you didn't add that disclaimer to your text. Hint: what if your data contained single quotes?
      Others have posted faster and more general solutions, so I won't include one of mine here. But avoid this eval solution, please.

      -- Randal L. Schwartz, Perl hacker

Re: varying length multi-dimensional hashes
by joe (Acolyte) on Oct 04, 2000 at 23:17 UTC
    thanks c-era here's the one that we came up with for creating the array:
    while(<>) { @parts = ($_ =~ /.{3}/g); $assign = "\$hash" . join('', map { "{\'$_\'}" } @parts) . "++;"; eval $assign; }
    Not as fast as I'd like, but I guess it's doing a lot of work...
Re: varying length multi-dimensional hashes
by fundflow (Chaplain) on Oct 04, 2000 at 23:15 UTC
    Just for curiosity, i'd like to know what is it used for.

    Also, if you want to count how many times aaaBBBccc appears, why not do $hash{'aaaBBBccc'}++?

    This has no problems with common prefixes (such as those chromatic pointed out) and needs no complex data structures or pointers.
      This smells like homeworks to me