Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

varying length multi-dimensional hashes

by joe (Acolyte)
on Oct 04, 2000 at 22:05 UTC ( #35303=perlquestion: print w/replies, xml ) Need Help??

joe has asked for the wisdom of the Perl Monks concerning the following question:

This one's got me a bit stumped: I'd like to do something like the following:
while(<>) { ($a, $b, $c, $d) =~ /(...)(...)(...)(...)/; $hash{$a}{$b}{$c}{$d}++; }
basically, splitting the line into 3 character chunks and placing them into a multi dimensional hash... however the line could be of arbitrary length, and I'd want to capture all the 3 character sequences (thus the hash could look like: $hash{$a}{$b}{$c}{$d}{$e}{$f}++ or just $hash{$a}++ ) Conversely, how would I read such a datastructure? i.e. if I know the depth of the hash, I can write:
foreach $key (keys %hash) { foreach $key1 (keys %{$hash{$key}}) { foreach $key2 (keys %{$hash{$key}{$key1}}) { # do something } } }
But I do not know the depth of the hash.

Replies are listed 'Best First'.
Re: varying length multi-dimensional hashes
by chromatic (Archbishop) on Oct 04, 2000 at 22:20 UTC
    So what you need is an arbitrarily nested hash searcher? Yikes, those can get tricky. See ref, but I'd start out with (pseudo) code like the following:
    sub find_value { my $hash_ref = shift; my $level; foreach my $key (keys %$hash_ref) { $level .= "$key\n"; my $value = $hash_ref->{$key}; if (ref($value) eq 'HASH') { $level .= find_value($value); } else { $level .= "\t$value\n"; } } return $level; }
    That's untested, and it doesn't do anything that Data::Dumper doesn't do better... but it's a place to start.
      Thanks Chromatic! That looks like the exact answer for traversing the hash. Now all I need is to do is to figure out how to create the hash. My partner suggests using eval. What do you think?
(tye)Re: varying length multi-dimensional hashes
by tye (Sage) on Oct 04, 2000 at 23:29 UTC

    Okay, here is one way to populate the hash:

    #!/usr/bin/perl -w use strict; my %hash; while( <> ) { chomp $_; my @trib= grep { "" ne $_ } split /(...)/, $_; my $ref= \\%hash; while( @trib ) { $ref= \$$ref->{shift @trib}; } $$ref++; }

    Note how I avoided having to save shift @trib to index into the next subhash by keeping a reference to the slot for the hash value instead of a reference to the hash.

    If that code makes sense to you right away, then you are either sick, overlooking something, or do way too much work with Perl references. (:

    P.S. I was seriously disappointed that

    my @trib= split /(?<=\G...)/, $_;
    doesn't work because split doesn't set pos($_) the way /(?<=\G...)/g would.

            - tye (but my friends call me "Tye")
Re: varying length multi-dimensional hashes
by Fastolfe (Vicar) on Oct 04, 2000 at 22:53 UTC
    Since you're dealing with references, you can traverse your hash with a pointer-type of reference.
    my %hash; while (<STDIN>) { chomp; my $pointer = \%hash; # start from root while (s/^(...)//) { # keep eating 1st 3 if ($_) { # if there's more to go, # we go deeper $pointer->{$1} = {} unless ref($pointer->{$1}); $pointer = $pointer->{$1}; } else { $pointer->{$1}++; # otherwise, set it to 1 } } }
    Note that the following data will mess things up slightly:
    111222333444555 111222333 $HASH{111}->{222}->{333} == 1 # lost 444 and 555 111222333 111222333444555 $HASH{111}->{222}->{333} != 1 # now a hash reference
    These are unavoidable though with the criteria you've given. This code also assumes that your string will always be of a length that is divisible by 3. If you have two trailing characters at the end, the substitution will fail, but the test for $_ will have succeeded in the previous loop, giving you a dangling empty hash reference instead of 1.
Re: varying length multi-dimensional hashes
by c-era (Curate) on Oct 04, 2000 at 23:05 UTC
    I don't think this is the best way to do it, but it sure was fun to write:
    my $eval; my %hash; # Find all three letter occurences, and combine them into one eval sta +tment. while (<>){ $eval = '$hash '; while (/(...)/g){ $eval .= "{'$1'}" } $eval .= "++;" eval ($eval); # You should probably do some error checking at this + point. } for (keys %hash){ hashValue ($hash{$_}); } # You got to love recursive functions ;) sub hashValue { my $hash = @_[0]; if (defined %$hash){ for (keys %$hash){ hashValue ($$hash{$_}); } } else { print $hash; # Or do whatever you like with the value } }
(jcwren) RE: varying length multi-dimensional hashes
by jcwren (Prior) on Oct 04, 2000 at 23:16 UTC
    I'll leave the reading of the hash as an exercise to the user, but here's a fun way to populate the hash:

    Update: Apparently, here's a really slow, dangerous, stupid, fraught with peril, performancing dehancing way to do it. Oh well, some code just sucks, don't it?
    #!/usr/local/bin/perl -w use strict; use Data::Dumper; { my %hash = (); while (<DATA>) { my $s; $s .= "{'$_'}" foreach (/(...)/g); eval ('$hash ' . $s . '++'); die $@ if $@; } print Dumper ([\%hash]); } __DATA__ aaabbbcccddd aab aacbbb aaabbacca aaabbaccb ababbbddd

    $VAR1 = [ { 'aab' => 1, 'aba' => { 'bbb' => { 'ddd' => 1 } }, 'aac' => { 'bbb' => 1 }, 'aaa' => { 'bba' => { 'cca' => 1, 'ccb' => 1 }, 'bbb' => { 'ccc' => { 'ddd' => 1 } } } } ];

    e-mail jcwren
      I voted this one down because
      • it's an unnecessary use of runtime-string-compilation eval
      • it breaks on general data, so it's not a good general pattern, and you didn't add that disclaimer to your text. Hint: what if your data contained single quotes?
      Others have posted faster and more general solutions, so I won't include one of mine here. But avoid this eval solution, please.

      -- Randal L. Schwartz, Perl hacker

Re: varying length multi-dimensional hashes
by joe (Acolyte) on Oct 04, 2000 at 23:17 UTC
    thanks c-era here's the one that we came up with for creating the array:
    while(<>) { @parts = ($_ =~ /.{3}/g); $assign = "\$hash" . join('', map { "{\'$_\'}" } @parts) . "++;"; eval $assign; }
    Not as fast as I'd like, but I guess it's doing a lot of work...
Re: varying length multi-dimensional hashes
by fundflow (Chaplain) on Oct 04, 2000 at 23:15 UTC
    Just for curiosity, i'd like to know what is it used for.

    Also, if you want to count how many times aaaBBBccc appears, why not do $hash{'aaaBBBccc'}++?

    This has no problems with common prefixes (such as those chromatic pointed out) and needs no complex data structures or pointers.
      This smells like homeworks to me

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://35303]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (10)
As of 2021-01-18 11:29 GMT
Find Nodes?
    Voting Booth?