Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Statistician in my garbage...

by larsen (Parson)
on Mar 16, 2001 at 02:46 UTC ( #64824=CUFP: print w/ replies, xml ) Need Help??

Mainly it's a HTML::Parser exercise, done during a heavy research moment :)

The idea is very simple: randomly putting together texts and images from your browser's cache, you could get a snapshot of your (or someone else's) behaviours. Like a statistician in your garbage :)

I grabbed the idea from an old DOMUS issue, but I've lost the original URL.

Update: I'm thinking to use it as a permanent installation in an Internet Cafe'. Two monitors: one used by surfer, and one connected to a box that automatically refresh a page generated by this script.

#!/usr/bin/perl use strict; # Digs in your browser's cache # like a statistician in your trashcan... package Lurker; use File::Find; my $cache = { IMAGES => [], DOCS => [], }; sub lurk { my $dir = shift; print STDERR "Reading cache..."; find( sub { for ( $File::Find::name ) { /\.gif$/ || /\.png$/ || /\.jpg$/ && push @{ $cache->{ IMAGES }}, $_; /\.html$/ && push @{ $cache->{ DOCS }}, $_; } }, $dir ); print STDERR "OK!\n"; } sub pick_random { my $what = shift; my $n = scalar( @{$cache->{ $what }} ); return ${$cache->{ $what }}[ rand $n ]; } package My_HTML_Parser; use base 'HTML::Parser'; sub start { my $self = shift; my ($tag, $attr, $attrseq, $origtext) = @_; my ($orig_src, $new_src); if ($tag eq 'img') { $orig_src = $attr->{'src'}; $new_src = Lurker::pick_random( 'IMAGES' ); $origtext =~ s/$orig_src/$new_src/; } print $origtext; } sub text { my $self = shift; my ($text) = @_; print $text; } sub end { my $self = shift; my ($tag) = @_; print "</$tag>"; } package main; my $cache_directory = '/home/stefano/.netscape/cache'; Lurker::lurk( $cache_directory ); my $doc = Lurker::pick_random('DOCS'); print STDERR "Now parsing $doc...\n"; my $a = new My_HTML_Parser; $a->parse_file( $doc );

Comment on Statistician in my garbage...
Download Code
Replies are listed 'Best First'.
Re: Statistician in my garbage...
by merlyn (Sage) on Mar 16, 2001 at 02:57 UTC
    You're working too hard on that HTML::Parser part:
    use HTML::Parser; use HTML::Entities; HTML::Parser->new( default_h => [sub { print shift; }, "text"], start_h => [sub { local ($text, $tag,$attr) = shift; if ($tag eq "img") { $attr->{src} = Lurker::pick_random( 'IMAGES' ); $text = "<$tag"; $text .= " $_=\"" . encode_entities($attr->{$_} +). "\"" for keys %$attr; $text .= ">"; } print $text; }, "text,tag,attr"], )->parse_file($doc);

    -- Randal L. Schwartz, Perl hacker

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://64824]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2016-02-11 08:00 GMT
Find Nodes?
    Voting Booth?

    How many photographs, souvenirs, artworks, trophies or other decorative objects are displayed in your home?

    Results (362 votes), past polls