http://www.perlmonks.org?node_id=442415

Miguel has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed Monks,
I need some light on creating a Hash of Hashes, since all the results I'm getting with the code below are wrong.
#!/usr/bin/perl -w use strict; #use IO::All; use Data::Dumper; my %result; for my $l (<DATA>) { my ($cod,$desc) = ($l =~/(\d*) (.*)/); my ($c1,$c2,$c3,$c4,$c5) = unpack ("A1 A1 A1 A1 A1",$cod); $result{$c1}{$c2}{$c3}{$c4}{$c5} .= $desc; } print Dumper \%result; __DATA__ 1 ITEM 11 Sub Item 1 111 Item X 112 Item 1121 Another Item 11212 And Another Item 12 Sub Item 2

This returns:

$VAR1 = { '1' => { '' => { '' => { '' => { '' => 'ITEM' } } }, '1' => { '' => { '' => { '' => 'Sub Item 1' } }, '1' => { '' => { '' => 'Item X' } }, '2' => { '' => { '' => 'Item' }, '1' => { '' => 'Another Item', '2' => 'And Another Item' } } }, '2' => { '' => { '' => { '' => 'Sub Item 2' } } } } };

But what I need is a structure like:

$VAR1 = { '1' => { '0' => 'ITEM', '11' => { '0' => 'Sub Item 1', 111 => 'Item X', 112 => { '0' => 'Item', '1121' => { '0' => 'Another Item', '11212' => 'And Another Item' } } }, '12' => 'Sub Item 2' } } }

What am I missing here?
Thanks,
Miguel

2005-03-25 Janitored by Arunbear - added readmore tags, as per Monastery guidelines

Replies are listed 'Best First'.
Re: Generation of a Hash of Hashes
by tilly (Archbishop) on Mar 25, 2005 at 21:51 UTC
    Your existing code hardcodes all of the possible levels. You need to traverse the hash to whatever the appropriate depth is and then place your entry there. Try the following to see how to do it:
    #!/usr/bin/perl -w use strict; use Data::Dumper; my %result; for my $l (<DATA>) { my ($cod,$desc) = ($l =~/(\d*) (.*)/); my $hashref = \%result; my $partial = ''; foreach my $layer (split //, $cod) { $partial .= $layer; $hashref = ($hashref->{$partial} ||= {}); } $hashref->{0} = $desc; } print Dumper \%result; __DATA__ 1 ITEM 11 Sub Item 1 111 Item X 112 Item 1121 Another Item 11212 And Another Item 12 Sub Item 2
    Note that before your iron down what you want your representation to be, you should try to write sample code to do the tasks that you need to do using this representation. I'm trying to figure out for what task this data structure would be convenient, and I'm having no luck. Combined with your inexperience, I'd take this as a sign that you probably need a different representation but don't know what.
      "I'm trying to figure out for what task this data structure would be convenient"

      Thanks for your reply.

      For this case, I have a file with information formated as I've posted above.

      This is:
      1 to 5 digits, 1 space, 1 to 65 characters

      Each line represents classes and subclasses (and sub-sub-classes...) and their names.
      Each digit represents one level.

      For example:

      11234 Some Text ^^^^^^^^^---- Name of this class (level) ^------- 5th class ^-------- 4th class ^--------- 3th class ^---------- 2nd class ^----------- Top (main class)

      Classes' levels varies between 1 and 9.

      I'm trying to use this Hash to create an XML document.

        I'd suggest trying to write code to produce the XML document and only then figure out what data structure you need to do it. (There are so many variations in what XML can look like that your task doesn't tell me what a good data structure would be for your problem.)
Re: Generation of a Hash of Hashes
by ambs (Pilgrim) on Mar 25, 2005 at 21:57 UTC

    Basicaly, you can't go it that way. Using the following line

        $result{$c1}{$c2}{$c3}{$c4}{$c5} .= $desc;
    
    you always create a 5 level hash, and that's not what you want.

    Also, your output example is not consistent. Basically you have some cases where you attach a leaf to a '0' on the hash table and in some other cases you attach it to a full number. Or do you want to have the '0' just on cases where the element has childs?

    Now, how to help you to solve your problem.... first, I would use pattern matching instead of packs. second, you can't know if an element has more child or not before processing the remaining index. So, the structure you are using with in some cases a '0' and in some other without the '0' is complicated.

    I hope this helps you to think a little more on your problem.

    Alberto Simões

    Python's syntax succeeds in combining the mistakes of Lisp and Fortran. I do not contrue that as progress.

    -- Larry Wall
Re: Generation of a Hash of Hashes
by tlm (Prior) on Mar 25, 2005 at 23:56 UTC

    Here's a start. It works with your sample input, but it includes no error checking, so it's not very robust. The internal case analysis can probably be tightened.

    the lowliest monk

    #!/usr/bin/perl -w use strict; use Data::Dumper; my %result; for my $l (<DATA>) { chomp $l; my ($cod, $desc) = split ' ', $l, 2; my @parts = split //, $cod; my $key = ''; my $hash = \%result; while (1) { $key .= shift @parts; if (@parts) { if ( ref $hash->{ $key } ) { $hash = $hash->{ $key }; } else { $hash = ( $hash->{ $key } = +{ 0 => $hash->{ $key } } ); } } else { if ( defined $hash->{ $key } ) { $hash->{ $key }->{ 0 } = $desc; } else { $hash->{ $key } = $desc; } last; } } } print Dumper \%result; __DATA__ 1 ITEM 11 Sub Item 1 111 Item X 112 Item 1121 Another Item 11212 And Another Item 12 Sub Item 2
Re: Generation of a Hash of Hashes
by cog (Parson) on Mar 25, 2005 at 21:51 UTC
    What am I missing here?

    Maybe I'm the one missing something here, but why is "Sub Item 2" in $result{1}{12} and not in $result{1}{12}{0}? Because that's what it seems you want to do by looking at "ITEM", "Sub Item 1"... OTOH, "Item X" and "And Another Item" aren't placed after a 0 either...

    Could you ellaborate a little more (in words) on what you're trying to do?

    Update: And why do you use IO::All; if you're not doing anything that requires it? :-\

      $result{1}{12}{0}? Because that's what it seems you want to do by looking at "ITEM", "Sub Item 1"... OTOH, "Item X" and "And Another Item" aren't placed after a 0 either...

      Trees are a PITA, plain and simple. What you're not seeing in the example is that there's some extra logic being assumed, that makes sense to someone who's dealing with the a variable depth structure, but actually makes it more difficult to parse overall. (when it's a leaf, and not a branch, it's using different syntax -- which is annoying, when you have to add a branch there, and move the leaf)

      Of course, I don't know if there's something already in place to deal with the structure. If I were starting from scratch, I'd probably use something like:

      Of course, this assumes that 0 is never used as an index, and with this structure is that the individual items need to know their path up the tree to get their whole code. If the indexes are always numbers, the following may be useful:

      (and I'm surprised that tilly and I both use the same method for walking trees ... anyone else have another way to do it?)

      "And why do you use IO::All; if ..." Oopss... my mistake :-) That's not used here, of course.