Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Statistician in my garbage...

by larsen (Parson)
on Mar 16, 2001 at 02:46 UTC ( [id://64824]=CUFP: print w/replies, xml ) Need Help??

Mainly it's a HTML::Parser exercise, done during a heavy research moment :)

The idea is very simple: randomly putting together texts and images from your browser's cache, you could get a snapshot of your (or someone else's) behaviours. Like a statistician in your garbage :)

I grabbed the idea from an old DOMUS issue, but I've lost the original URL.

Update: I'm thinking to use it as a permanent installation in an Internet Cafe'. Two monitors: one used by surfer, and one connected to a box that automatically refresh a page generated by this script.

#!/usr/bin/perl use strict; # Digs in your browser's cache # like a statistician in your trashcan... package Lurker; use File::Find; my $cache = { IMAGES => [], DOCS => [], }; sub lurk { my $dir = shift; print STDERR "Reading cache..."; find( sub { for ( $File::Find::name ) { /\.gif$/ || /\.png$/ || /\.jpg$/ && push @{ $cache->{ IMAGES }}, $_; /\.html$/ && push @{ $cache->{ DOCS }}, $_; } }, $dir ); print STDERR "OK!\n"; } sub pick_random { my $what = shift; my $n = scalar( @{$cache->{ $what }} ); return ${$cache->{ $what }}[ rand $n ]; } package My_HTML_Parser; use base 'HTML::Parser'; sub start { my $self = shift; my ($tag, $attr, $attrseq, $origtext) = @_; my ($orig_src, $new_src); if ($tag eq 'img') { $orig_src = $attr->{'src'}; $new_src = Lurker::pick_random( 'IMAGES' ); $origtext =~ s/$orig_src/$new_src/; } print $origtext; } sub text { my $self = shift; my ($text) = @_; print $text; } sub end { my $self = shift; my ($tag) = @_; print "</$tag>"; } package main; my $cache_directory = '/home/stefano/.netscape/cache'; Lurker::lurk( $cache_directory ); my $doc = Lurker::pick_random('DOCS'); print STDERR "Now parsing $doc...\n"; my $a = new My_HTML_Parser; $a->parse_file( $doc );

Replies are listed 'Best First'.
Re: Statistician in my garbage...
by merlyn (Sage) on Mar 16, 2001 at 02:57 UTC
    You're working too hard on that HTML::Parser part:
    use HTML::Parser; use HTML::Entities; HTML::Parser->new( default_h => [sub { print shift; }, "text"], start_h => [sub { local ($text, $tag,$attr) = shift; if ($tag eq "img") { $attr->{src} = Lurker::pick_random( 'IMAGES' ); $text = "<$tag"; $text .= " $_=\"" . encode_entities($attr->{$_} +). "\"" for keys %$attr; $text .= ">"; } print $text; }, "text,tag,attr"], )->parse_file($doc);

    -- Randal L. Schwartz, Perl hacker

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://64824]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2025-06-14 18:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.