note
Cristoforo
<strike>I got the following output:
<c>
C:\Old_Data\perlp>perl t33.pl
david
Website: www.facebook.com, Category: Social Networking
john
Website: www.yahoo.com, Category: Entertainment
Website: www.yahoo.com, Category: Entertainment
Website: www.yahoo.com, Category: Entertainment
Website: www.facebook.com, Category: Social Networking
mike
Website: www.google.com, Category: Search Engines
Name: john
Website Count
www.yahoo.com 3
www.facebook.com 1
Type Count
Entertainment 3
Social Networking 1
Name: mike
Website Count
www.google.com 1
Type Count
Search Engines 1
Name: david
Website Count
www.facebook.com 1
Type Count
Social Networking 1
</c>
From this data:
<c>user="john" website="www.yahoo.com" type="Entertainment"
user="john" website="www.yahoo.com" type="Entertainment"
user="john" website="www.yahoo.com" type="Entertainment"
user="david" website="www.facebook.com" type="Social Networking"
user="john" website="www.facebook.com" type="Social Networking"
user="mike" website="www.google.com" type="Search Engines"
</c>
Notice that there are quotes surrounding every field. The regular expression that captures these fields from the file would need to be changed if thats not the case.
<p>In my program I use 2 hashes - one to count the number of sites visited by each user, <c>%count</c>, and one to count each address and category (by user), <c>%data</c>.
It seems to work OK for this small data set.
<c>#!/usr/bin/perl
use strict;
use warnings;
my (%data, %count);
while (<DATA>) {
my ($user, $site, $cat) = /"([^"]+)"/g;
$data{$user}{ qq{$site$;$cat} }++;
$count{$user}++;
}
for my $user (sort keys %data) {
my $href = $data{$user};
print $user, "\n";
for my $key (keys %$href) {
my $str = sprintf "\tWebsite: %s, Category: %s\n", split /$;/, $key;
print $str x $href->{$key};
}
}
my @ordered = sort {$count{$b} <=> $count{$a}} keys %count;
print "\n\n";
for my $user (@ordered) {
my $href = $data{$user};
print "Name: $user\n\tWebsite Count\n";
for my $key (sort {$href->{$b} <=> $href->{$a}} keys %$href) {
printf "\t%-20s%d\n", (split /$;/, $key)[0], $href->{$key};
}
print "\n";
print "\tType Count\n";
for my $key (sort {$href->{$b} <=> $href->{$a}} keys %$href) {
printf "\t%-20s%d\n", (split /$;/, $key)[1], $href->{$key};
}
print "\n\n";
}</c>
The line <c>$data{$user}{ qq{$site$;$cat} }++;
</c> uses a 'compound' key ($site and $cat joined by $;).
<p> Here is a dump of <c>%data</c>.
<p><c>$VAR1 = {
'john' => {
'www.yahoo.com‡˜Entertainment' => 3,
'www.facebook.com‡˜Social Networking' => 1
},
'mike' => {
'www.google.com‡˜Search Engines' => 1
},
'david' => {
'www.facebook.com‡˜Social Networking' =>
}
};</c></strike>
<p>
<b>Update:</b> Whoops, that doesn't count the categories correctly :-(<br>
If there was another site with the same category, it wouldn't be totaled with the same category from another site.
963237
963561