http://www.perlmonks.org?node_id=1210297


in reply to multi dimensional hash

Make the "variants" an array and push items onto it. Then sift out unique values once all the data has been read.

use strict; use warnings; use Data::Dumper; open my $inFH, q{<}, \ <<EOD or die $!; The DT the International NN International for IN for well NN well preparation NN preparation preparation NN preparation in IN in conference NN conference conference NN conference conferences NN conference good VVG good EOD do { my $discard = <$inFH> }; my %hash; while ( <$inFH> ) { my @tags = split; next unless $tags[ 1 ] eq q{NN}; $hash{ $tags[ 2 ] }->{ frequency } ++; push @{ $hash{ $tags[ 2 ] }->{ variants } }, $tags[ 0 ]; } @{ $hash{ $_ }->{ variants } } = do { my %seen; grep { not $seen{ $_ } ++ } @{ $hash{ $_ }->{ variants } }; } for keys %hash; print Data::Dumper->Dumpxs( [ \ %hash ], [ qw{ *hash } ] );

The output.

%hash = ( 'preparation' => { 'frequency' => 2, 'variants' => [ 'preparation' ] }, 'conference' => { 'frequency' => 3, 'variants' => [ 'conference', 'conferences' ] }, 'well' => { 'frequency' => 1, 'variants' => [ 'well' ] }, 'International' => { 'variants' => [ 'International' ], 'frequency' => 1 } );

I hope this is helpful.

Update: Perhaps simpler would be to keep a ->{ seen }->{ $tags[ 0 ] } sub-sub-HoH to filter out duplicates and delete it at the end.

... while ( <$inFH> ) { my @tags = split; next unless $tags[ 1 ] eq q{NN}; $hash{ $tags[ 2 ] }->{ frequency } ++; push @{ $hash{ $tags[ 2 ] }->{ variants } }, $tags[ 0 ] unless $hash{ $tags[ 2 ] }->{ seen }->{ $tags[ 0 ] } ++; } delete $hash{ $_ }->{ seen } for keys %hash; ...

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^2: multi dimensional hash
by AnomalousMonk (Archbishop) on Mar 04, 2018 at 01:33 UTC
    do { my $discard = <$inFH> };

    I don't think Anonymous Monk really wants to discard the first line/record of data (update: that was my first thought, too); it's just an illusion created by the peculiar way he or she declares and pre-initializes the  $line variable prior to entering the  while($line){ ... } loop in the OPed code. Note also the odd way the next  $line of data is read at the end of the while-loop in that code.

    Update: I must also express my preference for the use of List::Util::uniq() (which used to be in List::MoreUtils — and still is!) rather than the explicit grep-ing to a hash that you're doing: it seems to express intent much more clearly for little or no cost.


    Give a man a fish:  <%-{-{-{-<

      I agree that List::Util::uniq would be nicer but it only made the transition to the core module in 5.26 which I don't have installed on this box (and I haven't installed List::MoreUtils yet after having to rebuild after Meltdown & Spectre patches killed it). You are right about the first line not being discarded, missed that entirely :-/

      Cheers,

      JohnGG