Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Statistician in my garbage...

by larsen (Parson)
on Mar 16, 2001 at 02:46 UTC ( [id://64824]=CUFP: print w/replies, xml ) Need Help??

Mainly it's a HTML::Parser exercise, done during a heavy research moment :)

The idea is very simple: randomly putting together texts and images from your browser's cache, you could get a snapshot of your (or someone else's) behaviours. Like a statistician in your garbage :)

I grabbed the idea from an old DOMUS issue, but I've lost the original URL.

Update: I'm thinking to use it as a permanent installation in an Internet Cafe'. Two monitors: one used by surfer, and one connected to a box that automatically refresh a page generated by this script.

#!/usr/bin/perl use strict; # Digs in your browser's cache # like a statistician in your trashcan... package Lurker; use File::Find; my $cache = { IMAGES => [], DOCS => [], }; sub lurk { my $dir = shift; print STDERR "Reading cache..."; find( sub { for ( $File::Find::name ) { /\.gif$/ || /\.png$/ || /\.jpg$/ && push @{ $cache->{ IMAGES }}, $_; /\.html$/ && push @{ $cache->{ DOCS }}, $_; } }, $dir ); print STDERR "OK!\n"; } sub pick_random { my $what = shift; my $n = scalar( @{$cache->{ $what }} ); return ${$cache->{ $what }}[ rand $n ]; } package My_HTML_Parser; use base 'HTML::Parser'; sub start { my $self = shift; my ($tag, $attr, $attrseq, $origtext) = @_; my ($orig_src, $new_src); if ($tag eq 'img') { $orig_src = $attr->{'src'}; $new_src = Lurker::pick_random( 'IMAGES' ); $origtext =~ s/$orig_src/$new_src/; } print $origtext; } sub text { my $self = shift; my ($text) = @_; print $text; } sub end { my $self = shift; my ($tag) = @_; print "</$tag>"; } package main; my $cache_directory = '/home/stefano/.netscape/cache'; Lurker::lurk( $cache_directory ); my $doc = Lurker::pick_random('DOCS'); print STDERR "Now parsing $doc...\n"; my $a = new My_HTML_Parser; $a->parse_file( $doc );

Replies are listed 'Best First'.
Re: Statistician in my garbage...
by merlyn (Sage) on Mar 16, 2001 at 02:57 UTC
    You're working too hard on that HTML::Parser part:
    use HTML::Parser; use HTML::Entities; HTML::Parser->new( default_h => [sub { print shift; }, "text"], start_h => [sub { local ($text, $tag,$attr) = shift; if ($tag eq "img") { $attr->{src} = Lurker::pick_random( 'IMAGES' ); $text = "<$tag"; $text .= " $_=\"" . encode_entities($attr->{$_} +). "\"" for keys %$attr; $text .= ">"; } print $text; }, "text,tag,attr"], )->parse_file($doc);

    -- Randal L. Schwartz, Perl hacker

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://64824]
Approved by root
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-06-15 18:01 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.