http://www.perlmonks.org?node_id=997136

dayton has asked for the wisdom of the Perl Monks concerning the following question:

I know this is somewhat simple, but I've been unable to wrap my head around it. I have a large txt file in the format:

*******************Folder North*************** Folder : North Folder : Lab Host : host42016 VM : host42229 VM : host42235 VM : host42236 VM : host42237 VM : host42238 VM : host42239 VM : host42240 VM : host42241 VM : host42242 VM : host42252 Host : host42049 VM : host42361 VM : host42362 VM : host42363 VM : host42364 VM : host42365 VM : host42366 VM : host42367 VM : host42368 VM : host42369 VM : host42370 *******************Folder South*************** Folder : South Folder : Staging Host : host42124 VM : host42248 VM : host42249 VM : host42250 VM : host42230 VM : host42251 VM : host42243 VM : host42244 VM : host42245 VM : host42246 VM : host42247 Host : host42125 VM : host42299 VM : host42300 VM : host42301 VM : host42302 VM : host42303 VM : host42304 VM : host42305 VM : host42306 VM : host42307 VM : host42308

I want to read the file, and populate a hash with the "Host" as the key the various "VM" as it's values. Seems easy enough and I have that with:

open (FILE, 'data.txt') || die "Unable to open file"; while (<FILE>) { $line=$_; if ($line =~ /Folder/ || $line eq ""){ next; } ($type,$name)=split(/:/); $type =~ s/^\s+//; $type =~ s/\s+$//; $name =~ s/^\s+//; $name =~ s/\s+$//; if ($type=~"Host"){ $host=$name; }elsif ($type=~"VM"){ $guest=$name; push @{$hash{$host}}, $guest; } }

Where I'm running into a brick wall is:

  1. how do I remove duplicates (Host and VM)?
  2. how do I then count the # of values per key?
I've googled, RTFM, etc; but nothing I've tried has worked to my satisfaction.

Replies are listed 'Best First'.
Re: Parsing text file into a hash with multiple values per key
by Kenosis (Priest) on Oct 03, 2012 at 22:41 UTC

    If I've understood your issue, consider the following:

    use strict; use warnings; my ( $host, %hash ); while (<DATA>) { chomp; next if /Folder/ or not $_; my ( $type, $name ) = /(\S+)\s*:\s*(\S+)/; if ( $type =~ "Host" ) { $host = $name; } elsif ( $type =~ "VM" ) { my $guest = $name; push @{ $hash{$host} }, $guest unless $guest ~~ @{ $hash{$host +} }; } } print "Key '$_' has " . (scalar @{ $hash{$_} }) . " values.\n" for key +s %hash; __DATA__ *******************Folder North*************** Folder : North Folder : Lab Host : host42016 VM : host42229 VM : host42235 VM : host42236 VM : host42237 VM : host42238 VM : host42239 VM : host42240 VM : host42241 VM : host42242 VM : host42252 Host : host42049 VM : host42361 VM : host42362 VM : host42363 VM : host42364 VM : host42365 VM : host42366 VM : host42367 VM : host42368 VM : host42369 VM : host42370 *******************Folder South*************** Folder : South Folder : Staging Host : host42124 VM : host42248 VM : host42249 VM : host42250 VM : host42230 VM : host42251 VM : host42243 VM : host42244 VM : host42245 VM : host42246 VM : host42247 Host : host42125 VM : host42299 VM : host42300 VM : host42301 VM : host42302 VM : host42303 VM : host42304 VM : host42305 VM : host42306 VM : host42307 VM : host42308

    Output:

    Key 'host42049' has 10 values. Key 'host42125' has 10 values. Key 'host42016' has 10 values. Key 'host42124' has 10 values.

    Only a few modifications have been made to your code.

    Am not sure I understand removing duplicate hosts. If hosts are hash keys, there will be no duplicates. Duplicate VM's are avoided by first checking--via Perl's smart matching (v5.10+)--if the VM is already in the array, and the captured VM is pushed onto the array only if it's not already there.

    Hope this helps!

    Update: Anonymous Monk's suggestion of using a hash of hashes (HoH) is an excellent one, as it would be faster, and would work on older Perl versions which don't support smart matching. Here's a HoH version that would operate on the same data set as above, and would generate the same output:

    use strict; use warnings; my ( $host, %hash ); while (<DATA>) { chomp; next if /Folder/ or not $_; my ( $type, $name ) = /(\S+)\s*:\s*(\S+)/; if ( $type =~ "Host" ) { $host = $name; } elsif ( $type =~ "VM" ) { my $guest = $name; $hash{$host}{$guest}++; } } print "Key '$_' has " . ( values %{ $hash{$_} } ) . " values.\n" for k +eys %hash;
      Ah... thanks, that did it...don't know why I couldn't see that.

      $guest unless $guest takes care of the duplicates I was seeing.

      I'd tried (scalar @{ $hash{$_} }) but must have munged some of the syntax. Thanks again, problem solved.

        You're most welcome! Am glad these worked for you.

      FWIW, checking for duplicates once after exit the loop should suffice :)

        Yes--good point.

Re: Parsing text file into a hash with multiple values per key
by Anonymous Monk on Oct 03, 2012 at 22:40 UTC

    how do I remove duplicates (Host and VM)?

    There are no duplicate hosts (hash takes care of that), so to get rid of duplicate VM, use another hash, a hash of hashes, ie  $HostVM{$host}{$VM}++

    how do I then count the # of values per key?

    For your existing code  my $count = @{ $hash{ $host } };

    For my code  my $count = keys %{ $hash{$host}{$vms} };

    TutorialsData Types and Variables

Re: Parsing text file into a hash with multiple values per key
by sundialsvc4 (Abbot) on Oct 04, 2012 at 16:13 UTC

    That looks an awful lot like a YAML file to me ... not saying that it is, but it sure do look suspiciously like it.   Can you get your hands on the program that produced that file, or talk to the programmer(s)?

      Unfortunately no :) though I've tried.