Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Hash of Hashes from file

by cipher (Acolyte)
on Apr 03, 2012 at 12:55 UTC ( #963237=perlquestion: print w/replies, xml ) Need Help??
cipher has asked for the wisdom of the Perl Monks concerning the following question:

Monks, I am trying to generate hash of hashes from a file and all of my attempts have failed.

I have been learning Perl for quite sometime now but still concept of hashes is not clear to me.

File format: user=john website="" type="Entertainment" user=david website="" type="Social Networking" user=john website="" type="Social Networking" user=mike website="" type="Search Engines"
I want to print all websites and categories visited by each unique user.

Something like this.

john => [Website: , Category: Entertainment, [Website:, Category:Social Networking] david =>[Website:, Category:Social Networking] mike=>[Website:, Category:Search Engines]
My script
open FILE, "new.log" or die; my %hoh; while (<FILE>) { if ($_ =~ /user="([^"]+)/) {$user = $1; } if ($_ =~ /website="([^"]+)/) {$website = $1; } if ($_ =~ /type="([^"]+)/) {$type = $1; } # shows only one website per user $hoh{$user}->{'Website'} = $site; $hoh{$user}->{'Category'} = $cat; } use Data::Dumper; print Dumper (%hoh); Output: $VAR1 = 'john'; $VAR2 = { 'Type' => 'Social Networking', 'Website' => '' }; $VAR3 = 'mike'; $VAR4 = { 'Type' => 'Search Engines', 'Website' => '' }; $VAR5 = 'david'; $VAR6 = { 'Type' => 'Social Networking', 'Website' => '' };
My script does not list all websites for user John. Any help is appreciated.

Replies are listed 'Best First'.
Re: Hash of Hashes from file
by scorpio17 (Abbot) on Apr 03, 2012 at 13:04 UTC

    You need a hash-of-arrays-of-hashes:

    push( @{ $hoh{$user} }, { Website => $site, Category => $cat, });
      Thanks for the quick update.I was avoiding arrays as the file size is too large, sometimes it goes up to 4-5 Gigs and if I use arrays script ends up running out of memory. I will try this on my file and will post the results.
        Well, the problem is that with a hash, you can only have one value for each key. If you need to associate multiple values per key, then the solution is to store the values in an array, and save the array reference in the hash. If you get to the point where you have more data than you can fit into memory at one time, then you need to look at using a database, like mysql. Instead of pushing the data from each line of the file into your hash, you would insert it into the database, then once all the data is loaded you can query the database.

        Ultimately if you want to allow more than one website per user, you're going to want an array layer in there: a hash of arrays of hashes. (You could replace the array layer with something functionally equivalent, like another hash layer, or Set::Object, but I see little point in doing that.)

        If you can't hold it all in memory, you're going to have to rethink your technique. Might it be possible to sort (or split) the file per-user, and then process the data one user at a time?

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://963237]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2018-11-18 12:10 GMT
Find Nodes?
    Voting Booth?
    My code is most likely broken because:

    Results (205 votes). Check out past polls.