Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

cbhistory

by pfaut (Priest)
on Dec 15, 2002 at 18:08 UTC ( [id://220030]=sourcecode: print w/replies, xml ) Need Help??
Category: PerlMonks Related Scripts
Author/Contact Info Tom Pfau pfaut
Description:

Like cblast35, keeps a history of messages that go through the chatterbox. This one uses a database (postgresql) to store the messages instead of a DB file. It keeps the last hour's worth of messages.

Try it at http://nbpfaus.net/~pfau/cbhistory.cgi.

# table definition
CREATE TABLE "chatter" (
    "chatterbox" text NOT NULL,
    "messagetime" timestamp with time zone NOT NULL,
    "userid" integer,
    "username" text,
    "message" text,
    Constraint "chatter_pkey" Primary Key ("chatterbox", "messagetime"
+)
);
#cbhistory.pl
#!/usr/bin/perl -w

=head1 DESCRIPTION - what is this cbhistory thing?

Well it keeps a history of messages uttered in the cb, so if you
walk in on something I<interesting>, you can catch up without
saying: "what are you guys talking about?"

=head1 USAGE

make a crontab entry looking like

 */5 * * * * cd /path/to/script/ && ./cbhistory.pl>/dev/null

You can do it using crontab -e.
You really shouldn't run it more than every 6 minutes, but because
of how cron works and whatnot, 5 will do.
Don't run it more often than 4 minutes, cause the whole point
is to get a recent history of messages so if you walk in something
you know what is up, not use it as a replacement to framechat ;)
This is not a cb client

=head1 SETUP

You need to define your database connection.  See the C<constant>
definitions below.

You also need to setup your database.  Messages are stored in a
table called C<chatter> with the following layout (this for postgresql
+):

    Column        Type        Usage
    Chatterbox        text        Chatterbox name (see %boxes keys)
    MessageTime        timestamp    Time message was sent
    UserId        integer        ID of user that sent message
    UserName        text        Name of user that sent message
    Message        text        The message itself

This table should have a primary key consisting of the Chatterbox
and MessageTime fields.

=head1 HISTORY

this version written by pfaut of perlmonks.

cbhistory is derived from cblast35 which was written by
crazyinsomniac.

=cut

BEGIN # better then getting mail fron cron when the script fails
{     # even if it is a "performance penalty", but really, its not ;-)
    use CGI::Carp qw(carpout);

    open(LOG, ">>cbhistory.err.log") or
    die "can't append to cbhistory.pl.err.log: $!";
    carpout(\*LOG);
}

use strict;                     # Fo' health and pleasure
use XML::Parser;                # Fo' parsering'em XML
use DBI;            # Fo' da db
use LWP::UserAgent;             # Fo' fetching'em  tickers
require HTTP::Request;
require HTTP::Response;

use constant DB_CONNECT => 'dbi:Pg:dbname=chatter';
# this user should have all rights on the chatter table
use constant DB_USER => 'username';
use constant DB_PASS => 'password';

# known chatterboxes
my %boxes = ( PM => 'http://www.perlmonks.org/index.pl',
          JJ => 'http://www.javajunkies.org/index.pl' );

# it begins
my $dbh = DBI->connect(DB_CONNECT, DB_USER, DB_PASS,
               {RaiseError=>0,PrintError=>0})
   or die "Error connecting to database: $DBI::errstr";
my $insert_stmt = $dbh->prepare(<<SQL);
insert into chatter (chatterbox, messagetime, userid, username, messag
+e)
values (?, ?, ?, ?, ?)
SQL
    ;

foreach (sort keys %boxes) {
    &update_history($_,$boxes{$_});
};

$insert_stmt->finish;
$dbh->do(<<SQL);
delete from chatter where messagetime < current_timestamp - '1 hour'::
+interval
SQL
    ;
$dbh->disconnect;
exit;

sub update_history
{
    my ($boxname, $cbtickerurl) = @_;
    $cbtickerurl .= '?node=Chatterbox+xml+ticker';

    my $dangtimeout = 15;
    # apparently, this is not the timeout for the entire session
    # but for each packet ([id://79502])

    my $messages = &fetch_cb_xml($cbtickerurl,$dangtimeout);
    print "$boxname has no messages",return unless defined %{$messages
+};

    &tyebinds($boxname,$messages);
    undef $messages;
}# it ends
######################################################################
+########
###### - \ | / ~ \ | / - \ | / ~ \ | / - \ | / ~ \ | / - \ | / ~ S U B
+ L A N D

=head2 C<&tyebinds($box,$messages);>

Takes a reference to the freshly fetched messages.
asks L<&can_i_write> for I<permission>.
If denied, sleeps five seconds, and asks again.
die's if it doesn't get permission.
If it does get permission (a semaphore file is created),
it updates the DB_File database with the new messages,
and then removes all but the last 35 messages,
untie's the hash, and deletes the semaphore file

=cut

sub tyebinds
{
    my $box = shift;
    my $newmessages = shift;

# update the message hash
    for my $timestamp(sort keys %{$newmessages})
    {
    my ($tym) = $timestamp;
    $tym =~ s/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})/$1-$2-$3 $4:$
+5:$6/;
        my $idn = $newmessages->{$timestamp}->{monkid};
        my $nym = $newmessages->{$timestamp}->{monk};
        my $msg = $newmessages->{$timestamp}->{message};
    $msg =~ s/[\n\r]+//g;

    # ignore errors, probably duplicate key because we picked
    # up the message on the last scan
    $insert_stmt->execute($box, $tym, $idn, $nym, $msg);
    }
    undef $newmessages;
}



=head2 C<fetch_cb_xml($cbtickerurl)>

uses LWP::UserAgent to fetch the xml from $cbtickerurl.
Dies if this fails.
If it does not, uses XML::Parser to build a hash of the current messag
+es,
which is never more than 20(IIRC, or the last 8 minutes if things are 
+slow),
and returns a reference to that hash (C<\%messages>).

=cut

sub fetch_cb_xml
{
    my ($cbtickerurl,$dangtimeout) = @_;

    die("&fetch_cb_xml takes two params")unless($cbtickerurl && $dangt
+imeout);
    # why redundancy, dudn't hurt much

    my $raw_xml = &requestitraw($cbtickerurl,$dangtimeout);

    die "LWP::Simple::get ate it on $cbtickerurl ($!)"
    unless(length $raw_xml > 4);
    # self documenting code is goood, but comments can't hurt

    my $messages = {};

    my $xml_parser = new XML::Parser(
                                 Handlers => {
                                              Start   => \&_xml_start,
                                              End     => \&_xml_end,
                                              Char    => \&_xml_char,
                                              Default => \&_xml_def,
                                             }
                                );

    $xml_parser->{crazy_hashref_b392} = $messages;
 # make sure you don't call "crazy_hashref_b392"
 # "Handler" or some other key the module uses ;-)

    $xml_parser->parse($raw_xml); # parse the xml, &fill {crazy_hashre
+f_b392}

    undef($raw_xml);              # kinda redundant, but i like redund
+ancy
    undef($xml_parser);           # paranoia

    return $messages;
}



=head2 C<&requestitraw($cbtickerurl,$dangtimeout);>

Uses HTTP::Request along with LWP::UserAgent to fetch
the latest messages.

=cut

sub requestitraw
{
# LWP simple would've been fine, but hey, I wanted to use UserAgent
# But, Dangit Jim, I wanted a timeout
    my ($toget, $dangtimeout) = @_;
    my $REQUS = new HTTP::Request(GET => $toget);
    my $USERA = new LWP::UserAgent();
    $USERA->agent("cb Last 35 - crazy is good 4.98");

       $USERA->timeout($dangtimeout||30 ); # in case you think you're 
+smart

    my $RESPO = $USERA->simple_request($REQUS);
    die "the $toget request failed" if(!$RESPO->is_success && $RESPO->
+is_error);
    return $RESPO->content;
}

########### YOU CAN'T HAVE ANY PUDDIN', UNTIL YOU EAT YOUR MEAT ######
+##########
######################################################################
+##########
## Thank you id://62782 ####,
                            # The XML::Parser Handlers
sub _xml_start              # beginning tag
{
    my ($expat,             # the object who invoked the sub
         $name,             # what to do
         %attributes) = @_; # wood for the chipper(what the fu'?

    my $msghash = $expat->{crazy_hashref_b392};# don't call it Handler
+s ;-)
    if($name eq 'message')
    {
        my $userid = $attributes{'user_id'};
        my $author = $attributes{'author'};
        my $timest = $attributes{'time'};

        $expat->{mark} =    # the trigger
        $msghash->{$timest} = {monkid => $userid,
                               monk => $author,
                               message => ''};
    }
}

sub _xml_char               # more like text (tag encapsulated stuff)
{
    my ($expat, $not_markup) = @_;

    if(exists $expat->{mark} and defined $expat->{mark})
    {
                            # this be the stuff in between message tag
+s
        $expat->{mark}->{message} .= $not_markup;
                            # i .= append because XML::Parser chuncks
    }
}

sub _xml_def{}
# mostly space, with some tabs and newlines sprinkled about the north 
+west area

sub _xml_end                # it's an *end* (closing) tag
{
    my ($expat, $name) = @_;
    undef($expat->{mark});  # after the tag close, we wait for the nex
+t one
}
#cbhistory.cgi
#!/usr/bin/perl -w

# cbhistory is derived from cblast35 which was written by
# crazyinsomniac.

use strict;

use CGI qw¡:standard url *table *Tr *td¡;

   $CGI::DISABLE_UPLOADS = 1;# Disable uploads
   $CGI::POST_MAX        =-1;# Maximum number of bytes per post

use CGI::Carp q!fatalsToBrowser!;
use DBI;
use HTML::TokeParser;

use constant DB_CONNECT => 'dbi:Pg:dbname=chatter';
# this user needs SELECT rights on the chatter table
use constant DB_USER => 'username';
use constant DB_PASS => 'password';

my %allowed = ( a=>1, b=>1, br=>1, em=>1, i=>1, kbd=>1, s=>1, samp=>1,
        strike=>1, strong=>1, sub=>1, sup=>1, tt=>1, u=>1 );

# known chatterboxes
my %boxes = ( PM => { URL  => 'http://www.perlmonks.org/index.pl',
              NAME => 'Perl Monks' },
          JJ => { URL  => 'http://www.javajunkies.org/index.pl',
              NAME => 'JavaJunkies' } );

my $html;
my $q = new CGI;

# D'loop Mayno
{
    $|=1;
    # ChatterboxXMLTicker
    my $box = param("site");
    my $plain = param("plain");
    my $url = $q->url(-relative=>1);
    if ( !$box || !exists($boxes{$box}) ) {
    $html = &show_available_boxes();
    $html .= &show_links();
    } else {
    my $dbh = DBI->connect(DB_CONNECT, DB_USER, DB_PASS,
                   {RaiseError=>1})
        or die "Error connecting to database: $DBI::errstr";
    my $select_stmt = $dbh->prepare(<<SQL);
select messagetime, userid, username, message
  from chatter where chatterbox=?
 order by messagetime
SQL
    ;
    my $pmurl = $boxes{$box}{URL} . '?';
    $url .= "?site=$box";
    $url .= '&plain=1' unless $plain;

    $html =
        start_html('-title' =>
                   "$boxes{$box}{NAME} Recent Chatterbox Messages",
               '-dtd'   => "-//W3C//DTD HTML 4.0 Transitional//EN")
        . basefont({face  => "Arial",
                size  => "2",
                color => "black"})
        . h3("gmtime is ", &_timestamp)
        . p(a({href=>$url},
              $plain?"Formatted":"Unformatted"))
        . start_table({cellspacing => 2, width => "100%",
                   cellpadding => 2, border => 1});
    $select_stmt->execute($box);
    while (my $msg = $select_stmt->fetchrow_hashref) {
        my $user_ref = a( { href=>$pmurl.'node_id='.$msg->{userid} },
                  $msg->{username});
        my $text = $msg->{message};
        my $output;
        my $me = 0;
        if ($plain) {
        $output .= escapeHTML($text);
        } else {
        if ($text =~ m|^/me\W|) {
            $output = $user_ref;
            $text = substr($text,3);
            $me++;
        }
        $output .= &format_message($text,$pmurl);
        }
        $output = i($output) if $me;
        $html .= start_Tr . start_td;
        $html .= font({'-size' => '2'},
              $user_ref, br, $msg->{messagetime}
              );
        $html .= end_td . start_td;
#        $html .= textarea(-default=>$msg->{message}, -rows=>3, -colum
+ns=>80);
        $html .= span($output);
        $html .= end_td . end_Tr;
    }
    $select_stmt->finish;
    $dbh->disconnect;

    $html .= end_table;

    $html .= p("No recent messages.") if $select_stmt->rows == 0;
    
    $html .= hr
        . a( {'href' => "http://validator.w3.org/check/referer"},
         img( {'src'=>"valid-html40.png",
               'alt'=>"Valid HTML 4.0!",
               'border'=>"0",
               'height'=>"31",
               'width'=>"88"}
              )
         )
        . end_html;
    }
}

    print  header(-type => 'text/html',
          -content_length => length($html),
                  -expires => '+5m'  ), # cache only for five minutes
           $html;

exit;
################################## SUBLAND ######<<<<<<<<<<<<<<<<<<<<<
+<<<|~
################################## SUBLAND       >>>>>>>>>>>>>>>>>>>>>
+>>>|~


=head2 C<&_timestamp>

returns current perlmonks compatible gmtime

=cut

sub _timestamp      # current gmtime
{
    @_ = (gmtime(time))[5,4,3,2,1,0];
                    # gimme a slice of that list
    $_[0]+=1900;    # hey hey, y 2 k
    $_[1]+=1;       # 0..11 ne 'true month'
    return sprintf("%04u-%02u-%02u %02u:%02u:%02u", @_);
}

=head2 C<&show_available_boxes>

Displays a list of chatterboxes for which this system keeps histories.

=cut

sub show_available_boxes {
    my $html =
    start_html('-title' => "Recent Chatterbox Messages",
           '-dtd'   => "-//W3C//DTD HTML 4.0 Transitional//EN")
    . basefont({face  => "Arial",
            size  => "2",
            color => "black"})
    . h1('Available Chatterbox Histories')
    . p(q(Chatterbox messages from within the past hour are available
          for the following sites))
    . ul( li( [ map { a({href=>url.'?site='.$_},
                $boxes{$_}{NAME}) } sort keys %boxes ] ) )
    . end_html;
    return $html;
}

=head2 C<&addurls($siteurl,$p1,$p2>

Called from within a regex to convert bracketed text to urls.

C<$siteurl> should be the base URL for the site with a C<?> appended.

C<$p1> is the first part of the bracketed text (before the pipe).

C<$p2> is the second part of the bracketed text (after the pipe) or
undefined.

=cut

sub addurls {
    my ($siteurl,$p1,$p2) = @_;
    my $url;
    my $text;
# is a protocol specified?
    if ($p1 =~ m|^(\w+)://(.*)|) {
# match known 'protocols'
    if ($1 eq 'id') {
        $url = $siteurl."node_id=$2";
        $p2 = "node $2" unless $p2;
    } elsif ($1 eq 'pad') {
        $url = $siteurl."node=Scratch Pad Viewer&user=$2";
        $p2 = "$2's scratchpad" unless $p2;
    } elsif ($1 eq 'google') {
        $url = "http://www.google.com/search?text=$2";
        $p2 = $2 unless $p2;
    } elsif ($1 eq 'http') {
        $url = $p1;
    } elsif ($1 eq 'cpan') {
        $url = "http://search.cpan.org/search?mode=module&query=$2";
        $p2 = $2 unless $p2;
    }
    } else {
# just text, it's a page on the site
    $url = $siteurl."node=$p1";
    $p2 = $p1 unless $p2;
    }
    if ($url) {
# generate the address tag
    $text = $p2 ? $p2 : $url;
    $text = a({href=>$url},$text);
    } else {
# can't figure out what they're trying to link to
    $text = "[$p1";
    $text .= "|$p2" if $p2;
    $text .= ']';
    }
    $text;
}

=head2 C<&format_message>

Scans the message text for HTML tokens and bracketed text and
generates new HTML.

=cut

sub format_message {
    my ($text,$siteurl) = @_;
    my $output = "";
    my $tp = HTML::TokeParser->new(\$text);
    my $code = 0;
  TOKEN:
    while (my $tok = $tp->get_token) {
    if ($tok->[0] eq 'T') {
        if ($code) {
        $output .= escapeHTML($tok->[1]);
        } else {
        my $t = $tok->[1];
        $t =~ s{\[([^|\]]+)(?:\|([^\]]+))?\]}
        {&addurls($siteurl,$1,$2)}ge;
        $output .= $t;
        }
        next TOKEN;
    }
    if ($tok->[0] eq 'S') {
        if ($code) {
        $output .= escapeHTML($tok->[4]);
        next TOKEN;
        }
        if ($tok->[1] eq 'code') {
        $output .= $tok->[4];
        $code++;
        next TOKEN;
        }
        if ($allowed{$tok->[1]}) {
        $output .= $tok->[4];
        } else {
        $output .= escapeHTML($tok->[4]);
        }
        next TOKEN;
    }
    if ($tok->[0] eq 'C') {
        $output .= escapeHTML($tok->[1]);
        next TOKEN;
    }
    if ($tok->[0] eq 'E') {
        if ($code) {
        if ($tok->[1] eq 'code') {
            $output .= $tok->[2];
            $code = 0;
        } else {
            $output .= escapeHTML($tok->[2]);
        }
        next TOKEN;
        }
        if ($allowed{$tok->[1]}) {
        $output .= $tok->[2];
        } else {
        $output .= escapeHTML($tok->[2]);
        }
        next TOKEN;
    }
    $output .= escapeHTML($tok->[-1]);
    }
    $output;
}

=head2 C<&show_links>

Display links to related sites on the index page.

=cut

sub show_links {
    dl(map { ($a,$b) = split /\t/; dt(a({href=>$a},$a)).dd($b) } <DATA
+> );
}

# related site list
# site\tdescription
__DATA__
http://sourceforge.net/projects/jchatter/    Java Chatterbox project p
+age on SourceForge
jchatter.diff    Patch to Java Chatterbox to allow selection of a chat
+terbox on the command line
Replies are listed 'Best First'.
Re: cbhistory
by pfaut (Priest) on Aug 14, 2003 at 23:31 UTC

    The server that cbhistory runs on went down this afternoon due to a power failure. It had been up for over 205 days.

    Power was restored within a minute and the machine rebooted without incident - except that it's NTP server was down due to the massive power failure now affecting NYC and much of the northeast US. I guess the hardware clock drifted quite a bit over the past 7 months. With no NTP server to reset the system time, it came up with the time set about 15 hours off. As a result, as soon as cbhistory would write new records to the database, it purged them back out again thinking they were over an hour old.

    I just pointed xntpd at a server in Pennsylvania and the time is now correctly set. Cbhistory should be running properly again. ++bart for pointing out that it was down.

    90% of every Perl application is already written.
    dragonchild
Re: cbhistory
by sintadil (Pilgrim) on Sep 05, 2004 at 15:19 UTC

    I'm curious about what sort of license you've released this under, as I'm considering making some changes and sending them back here and I'd rather not just make an assumption that could cause much upsetment.

    Upon being informed of a certain node, I've learned that what I wanted to do with cbhistory is considered unethical. I hereby retract my request.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://220030]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-03-19 06:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found