Parsing text file into a hash with multiple values per key

dayton has asked for the wisdom of the Perl Monks concerning the following question:

I know this is somewhat simple, but I've been unable to wrap my head around it. I have a large txt file in the format:

*******************Folder North***************
Folder : North
   Folder : Lab
      Host : host42016
         VM : host42229
         VM : host42235
         VM : host42236
         VM : host42237
         VM : host42238
         VM : host42239
         VM : host42240
         VM : host42241
         VM : host42242
         VM : host42252
      Host : host42049
         VM : host42361
         VM : host42362
         VM : host42363
         VM : host42364
         VM : host42365
         VM : host42366
         VM : host42367
         VM : host42368
         VM : host42369
         VM : host42370

*******************Folder South***************
Folder : South
   Folder : Staging
      Host : host42124
         VM : host42248
         VM : host42249
         VM : host42250
         VM : host42230
         VM : host42251
         VM : host42243
         VM : host42244
         VM : host42245
         VM : host42246
         VM : host42247
      Host : host42125
         VM : host42299
         VM : host42300
         VM : host42301
         VM : host42302
         VM : host42303
         VM : host42304
         VM : host42305
         VM : host42306
         VM : host42307
         VM : host42308
[download]

I want to read the file, and populate a hash with the "Host" as the key the various "VM" as it's values. Seems easy enough and I have that with:

open (FILE, 'data.txt') || die "Unable to open file";
while (<FILE>) {
        $line=$_;
        if ($line =~ /Folder/ || $line eq ""){
            next;
        }
        ($type,$name)=split(/:/);
        $type =~ s/^\s+//;
        $type =~ s/\s+$//;
        $name =~ s/^\s+//;
        $name =~ s/\s+$//;
        if ($type=~"Host"){
            $host=$name;
        }elsif ($type=~"VM"){
            $guest=$name;
            push @{$hash{$host}}, $guest;
        }
}
[download]

Where I'm running into a brick wall is:

how do I remove duplicates (Host and VM)?
how do I then count the # of values per key?

I've googled, RTFM, etc; but nothing I've tried has worked to my satisfaction.

Comment on Parsing text file into a hash with multiple values per key Select or Download Code

Replies are listed 'Best First'.
Re: Parsing text file into a hash with multiple values per key by Kenosis (Priest) on Oct 03, 2012 at 22:41 UTC
If I've understood your issue, consider the following: use strict; use warnings; my ( $host, %hash ); while (<DATA>) { chomp; next if /Folder/ or not $_; my ( $type, $name ) = /(\S+)\s:\s(\S+)/; if ( $type =~ "Host" ) { $host = $name; } elsif ( $type =~ "VM" ) { my $guest = $name; push @{ $hash{$host} }, $guest unless $guest ~~ @{ $hash{$host +} }; } } print "Key '$_' has " . (scalar @{ $hash{$_} }) . " values.\n" for key +s %hash; __DATA__ *****************Folder North*********** Folder : North Folder : Lab Host : host42016 VM : host42229 VM : host42235 VM : host42236 VM : host42237 VM : host42238 VM : host42239 VM : host42240 VM : host42241 VM : host42242 VM : host42252 Host : host42049 VM : host42361 VM : host42362 VM : host42363 VM : host42364 VM : host42365 VM : host42366 VM : host42367 VM : host42368 VM : host42369 VM : host42370 ***************Folder South*********** Folder : South Folder : Staging Host : host42124 VM : host42248 VM : host42249 VM : host42250 VM : host42230 VM : host42251 VM : host42243 VM : host42244 VM : host42245 VM : host42246 VM : host42247 Host : host42125 VM : host42299 VM : host42300 VM : host42301 VM : host42302 VM : host42303 VM : host42304 VM : host42305 VM : host42306 VM : host42307 VM : host42308 [download] Output: `Key 'host42049' has 10 values. Key 'host42125' has 10 values. Key 'host42016' has 10 values. Key 'host42124' has 10 values.` [download] Only a few modifications have been made to your code. Am not sure I understand removing duplicate hosts. If hosts are hash keys, there will be no duplicates. Duplicate VM's are avoided by first checking--via Perl's smart matching (v5.10+)--if the VM is already in the array, and the captured VM is `push`ed onto the array only if it's not already there. Hope this helps! Update*: Anonymous Monk's suggestion of using a hash of hashes (HoH) is an excellent one, as it would be faster, and would work on older Perl versions which don't support smart matching. Here's a HoH version that would operate on the same data set as above, and would generate the same output: `use strict; use warnings; my ( $host, %hash ); while (<DATA>) { chomp; next if /Folder/ or not $_; my ( $type, $name ) = /(\S+)\s:\s*(\S+)/; if ( $type =~ "Host" ) { $host = $name; } elsif ( $type =~ "VM" ) { my $guest = $name; $hash{$host}{$guest}++; } } print "Key '$_' has " . ( values %{ $hash{$_} } ) . " values.\n" for k +eys %hash;` [download]	[reply] [d/l] [select]
Re^2: Parsing text file into a hash with multiple values per key by dayton (Acolyte) on Oct 03, 2012 at 22:53 UTC
Ah... thanks, that did it...don't know why I couldn't see that. `$guest unless $guest` takes care of the duplicates I was seeing. I'd tried `(scalar @{ $hash{$_} })` but must have munged some of the syntax. Thanks again, problem solved.	[reply] [d/l] [select]
Re^3: Parsing text file into a hash with multiple values per key by Kenosis (Priest) on Oct 03, 2012 at 22:58 UTC
You're most welcome! Am glad these worked for you.	[reply]
Re^2: Parsing text file into a hash with multiple values per key by Anonymous Monk on Oct 03, 2012 at 22:43 UTC
FWIW, checking for duplicates once after exit the loop should suffice :)	[reply]
Re^3: Parsing text file into a hash with multiple values per key by Kenosis (Priest) on Oct 03, 2012 at 22:59 UTC
Yes--good point.	[reply]
Re: Parsing text file into a hash with multiple values per key by Anonymous Monk on Oct 03, 2012 at 22:40 UTC
how do I remove duplicates (Host and VM)? There are no duplicate hosts (hash takes care of that), so to get rid of duplicate VM, use another hash, a hash of hashes, ie `$HostVM{$host}{$VM}++` how do I then count the # of values per key? For your existing code `my $count = @{ $hash{ $host } };` For my code `my $count = keys %{ $hash{$host}{$vms} };` Tutorials► Data Types and Variables	[reply] [d/l] [select]
Re: Parsing text file into a hash with multiple values per key by sundialsvc4 (Abbot) on Oct 04, 2012 at 16:13 UTC
That looks an awful lot like a YAML file to me ... not saying that it is, but it sure do look suspiciously like it. Can you get your hands on the program that produced that file, or talk to the programmer(s)?	[reply]
Re^2: Parsing text file into a hash with multiple values per key by dayton (Acolyte) on Oct 04, 2012 at 17:27 UTC
Unfortunately no :) though I've tried.	[reply]

Back to Seekers of Perl Wisdom