in reply to Re^6: Hash of Hashes from file
in thread Hash of Hashes from file
From this data:C:\Old_Data\perlp>perl t33.pl david Website: www.facebook.com, Category: Social Networking john Website: www.yahoo.com, Category: Entertainment Website: www.yahoo.com, Category: Entertainment Website: www.yahoo.com, Category: Entertainment Website: www.facebook.com, Category: Social Networking mike Website: www.google.com, Category: Search Engines Name: john Website Count www.yahoo.com 3 www.facebook.com 1 Type Count Entertainment 3 Social Networking 1 Name: mike Website Count www.google.com 1 Type Count Search Engines 1 Name: david Website Count www.facebook.com 1 Type Count Social Networking 1
Notice that there are quotes surrounding every field. The regular expression that captures these fields from the file would need to be changed if thats not the case.user="john" website="www.yahoo.com" type="Entertainment" user="john" website="www.yahoo.com" type="Entertainment" user="john" website="www.yahoo.com" type="Entertainment" user="david" website="www.facebook.com" type="Social Networking" user="john" website="www.facebook.com" type="Social Networking" user="mike" website="www.google.com" type="Search Engines"
In my program I use 2 hashes - one to count the number of sites visited by each user, %count, and one to count each address and category (by user), %data. It seems to work OK for this small data set.
The line $data{$user}{ qq{$site$;$cat} }++; uses a 'compound' key ($site and $cat joined by $;).#!/usr/bin/perl use strict; use warnings; my (%data, %count); while (<DATA>) { my ($user, $site, $cat) = /"([^"]+)"/g; $data{$user}{ qq{$site$;$cat} }++; $count{$user}++; } for my $user (sort keys %data) { my $href = $data{$user}; print $user, "\n"; for my $key (keys %$href) { my $str = sprintf "\tWebsite: %s, Category: %s\n", split /$;/, + $key; print $str x $href->{$key}; } } my @ordered = sort {$count{$b} <=> $count{$a}} keys %count; print "\n\n"; for my $user (@ordered) { my $href = $data{$user}; print "Name: $user\n\tWebsite Count\n"; for my $key (sort {$href->{$b} <=> $href->{$a}} keys %$href) { printf "\t%-20s%d\n", (split /$;/, $key)[0], $href->{$key}; } print "\n"; print "\tType Count\n"; for my $key (sort {$href->{$b} <=> $href->{$a}} keys %$href) { printf "\t%-20s%d\n", (split /$;/, $key)[1], $href->{$key}; } print "\n\n"; }
Here is a dump of %data.
$VAR1 = { 'john' => { 'www.yahoo.com‡˜Entertainment' => 3, 'www.facebook.com‡˜Social Networking' => 1 }, 'mike' => { 'www.google.com‡˜Search Engines' => 1 }, 'david' => { 'www.facebook.com‡˜Social Networking' => } };
Update: Whoops, that doesn't count the categories correctly :-(
If there was another site with the same category, it wouldn't be totaled with the same category from another site.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^8: Hash of Hashes from file
by Cristoforo (Curate) on Apr 06, 2012 at 15:08 UTC | |
by Cristoforo (Curate) on Apr 14, 2012 at 14:54 UTC | |
by cipher (Acolyte) on Apr 09, 2012 at 12:04 UTC |
In Section
Seekers of Perl Wisdom