Hi
I thought I'd post a database solution. Its not really necessary as the Perl code solution works. The advantage of loading into a database is if your file is too large to fit in memory. Also, if you wanted to see different views of the data, it would be probably easier to write an SQL query than to write another program, etc.
I can't vouch for the SQL here - I don't use it often, but it did produce the results similiar to the Perl program above.
You could run it if you had the DBI and DBD::SQLite modules on your system.
The first program creates the database and the second program runs the queries.
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbh = DBI->connect("dbi:SQLite:dbname=users.lite","","",
{PrintError => 1, AutoCommit => 0}) or die "Can't connect";
$dbh->do('DROP TABLE users');
$dbh->do(qq{ CREATE TABLE users
(user TEXT,
site TEXT,
type TEXT)
});
my $sql_fmt = "INSERT INTO users VALUES(?,?,?)";
while(<DATA>) {
$dbh->do($sql_fmt, {}, /"([^"]+)"/g);
$dbh->commit if $. % 1_000_000 == 0; # commit every 1,000,000
}
$dbh->commit;
$dbh->disconnect;
__DATA__
user="john" website="www.yahoo.com" type="Entertainment"
user="john" website="www.yahoo.com" type="Entertainment"
user="john" website="www.yahoo.com" type="Entertainment"
user="david" website="www.facebook.com" type="Social Networking"
user="john" website="www.facebook.com" type="Social Networking"
user="mike" website="www.google.com" type="Search Engines"
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbh = DBI->connect("dbi:SQLite:dbname=users.lite","","",
{PrintError => 1, AutoCommit => 0}) or die "Can't connect";
# Prepare and print list of all websites to every user
my $sth = $dbh->prepare(<<SQL);
SELECT *
FROM users
ORDER BY user, site
SQL
$sth->execute;
while(my @row = $sth->fetchrow_array) {
printf "%-15s%-20s%s\n", @row;
}
print "\n";
# Create list of users from most visits to least for @users array
$sth = $dbh->prepare(<<SQL);
SELECT user, COUNT(user) Count
FROM users
GROUP BY user
ORDER BY Count DESC, user
SQL
$sth->execute;
my @users;
while(my @row = $sth->fetchrow_array) {
push @users, $row[0];
}
# Counts for each website and counts of categories visited by user
for my $user (@users) {
$sth = $dbh->prepare(qq{SELECT site, COUNT(site) Count
FROM users
WHERE user = '$user'
GROUP BY site
ORDER BY Count DESC
});
$sth->execute;
printf "Name: %s\n\t%-20s%s\n", $user, qw/ Website Count /;
while(my @row = $sth->fetchrow_array) {
printf "\t%-20s%s\n", @row;
}
print "\n";
printf "\t%-20s%s\n", qw/ Category Count /;
$sth = $dbh->prepare(qq{SELECT type, COUNT(type) Count
FROM users
WHERE user = '$user'
GROUP BY type
ORDER BY Count DESC
});
$sth->execute;
while(my @row = $sth->fetchrow_array) {
printf "\t%-20s%s\n", @row;
}
print "\n";
}
$dbh->disconnect;
Chris
Update: Re-wrote the query in loop of '@users'.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.