<?xml version="1.0" encoding="windows-1252"?>
<node id="963767" title="Re^7: Hash of Hashes from file" created="2012-04-05 22:17:48" updated="2012-04-05 22:17:48">
<type id="11">
note</type>
<author id="421781">
Cristoforo</author>
<data>
<field name="doctext">
&lt;strike&gt;I got the following output:
&lt;c&gt;
C:\Old_Data\perlp&gt;perl t33.pl
david
        Website: www.facebook.com, Category: Social Networking
john
        Website: www.yahoo.com, Category: Entertainment
        Website: www.yahoo.com, Category: Entertainment
        Website: www.yahoo.com, Category: Entertainment
        Website: www.facebook.com, Category: Social Networking
mike
        Website: www.google.com, Category: Search Engines


Name: john
        Website Count
        www.yahoo.com       3
        www.facebook.com    1

        Type Count
        Entertainment       3
        Social Networking   1


Name: mike
        Website Count
        www.google.com      1

        Type Count
        Search Engines      1


Name: david
        Website Count
        www.facebook.com    1

        Type Count
        Social Networking   1
&lt;/c&gt;

From this data:
&lt;c&gt;user="john" website="www.yahoo.com" type="Entertainment"
user="john" website="www.yahoo.com" type="Entertainment"
user="john" website="www.yahoo.com" type="Entertainment"
user="david" website="www.facebook.com" type="Social Networking"
user="john" website="www.facebook.com" type="Social Networking"
user="mike" website="www.google.com" type="Search Engines"
&lt;/c&gt;
Notice that there are quotes surrounding every field. The regular expression that captures these fields from the file would need to be changed if thats not the case.
&lt;p&gt;In my program I use 2 hashes - one to count the number of sites visited by each user, &lt;c&gt;%count&lt;/c&gt;, and one to count each address and category (by user), &lt;c&gt;%data&lt;/c&gt;.
It seems to work OK for this small data set.

&lt;c&gt;#!/usr/bin/perl
use strict;
use warnings;

my (%data, %count);

while (&lt;DATA&gt;) {
    my ($user, $site, $cat) = /"([^"]+)"/g;
    $data{$user}{ qq{$site$;$cat} }++;
    $count{$user}++;
}

for my $user (sort keys %data) {
    my $href = $data{$user};
    print $user, "\n";
    for my $key (keys %$href) {
		my $str = sprintf "\tWebsite: %s, Category: %s\n", split /$;/, $key;
		print $str x $href-&gt;{$key};
    }
}

my @ordered = sort {$count{$b} &lt;=&gt; $count{$a}} keys %count;

print "\n\n";

for my $user (@ordered) {
	my $href = $data{$user};
	print "Name: $user\n\tWebsite Count\n";
	for my $key (sort {$href-&gt;{$b} &lt;=&gt; $href-&gt;{$a}} keys %$href) {
		printf "\t%-20s%d\n", (split /$;/, $key)[0], $href-&gt;{$key};
	}
	print "\n";
	print "\tType Count\n";
	for my $key (sort {$href-&gt;{$b} &lt;=&gt; $href-&gt;{$a}} keys %$href) {
		printf "\t%-20s%d\n", (split /$;/, $key)[1], $href-&gt;{$key};
	}
	print "\n\n";
}&lt;/c&gt;

The line &lt;c&gt;$data{$user}{ qq{$site$;$cat} }++;
&lt;/c&gt; uses a 'compound' key ($site and $cat joined by $;).
&lt;p&gt; Here is a dump of &lt;c&gt;%data&lt;/c&gt;.
&lt;p&gt;&lt;c&gt;$VAR1 = {
          'john' =&gt; {
                      'www.yahoo.com‡˜Entertainment' =&gt; 3,
                      'www.facebook.com‡˜Social Networking' =&gt; 1
                    },
          'mike' =&gt; {
                      'www.google.com‡˜Search Engines' =&gt; 1
                    },
          'david' =&gt; {
                       'www.facebook.com‡˜Social Networking' =&gt;
                     }
        };&lt;/c&gt;&lt;/strike&gt;
&lt;p&gt;
&lt;b&gt;Update:&lt;/b&gt; Whoops, that doesn't count the categories correctly  :-(&lt;br&gt;
If there was another site with the same category, it wouldn't be totaled with the same category from another site.</field>
<field name="root_node">
963237</field>
<field name="parent_node">
963561</field>
</data>
</node>
