Perl, SQLite3, and Parsing the Chatterbox Feed.

DigitalKitty has asked for the wisdom of the Perl Monks concerning the following question:

Hi all.

With help from: parv, dhoss, Fairy_Nuff, and planetscape, I started writing a chatterbox history tool for educational reasons.

use warnings;
use strict;
use LWP::Simple;
use DBI;

my $data    = '';
my $dbh     = '';
my $url     = 'http://www.perlmonks.org/?node_id=207304';
my $pat     = qr{ .*<author>(.*)<\/author>.*<text>(.*)<\/text }xs;

$data       = get( $url );
$dbh        = DBI->connect( "dbi:SQLite:dbname=C:\\testdb", "", "" );

 while ( ( my($auth, $text) = ( $data =~ m/$pat/gc ) ) ) {
   for( $text ) {
       s/[ ]+/ /g;
       s/^\s+//;
       s/\s+$//;
   }
  printf "%s: %s\n\n" , $auth , $text;

  $dbh->do('insert into monks values(?,?)', undef, $auth, $text );

}
[download]

I was hoping some of you could offer suggestions regarding how I might improve the design/functionality of the (currently beta quality) program. At the present time, it only displays the most recent author/comment as opposed to several speakers and their respective comments.

I took the liberty of including my (simple) table design as well:
SQLite 3.5.6

CREATE TABLE monks( 
monk varchar(25),
comment varchar(255)
);
[download]

Thanks,
~Katie

Comment on Perl, SQLite3, and Parsing the Chatterbox Feed. Select or Download Code

Replies are listed 'Best First'.
Re: Perl, SQLite3, and Parsing the Chatterbox Feed. by McDarren (Abbot) on Feb 14, 2008 at 06:27 UTC
um, two comments.. You're parsing XML with a regex. Tsk! Tsk!. You should know better than that :p Use a proper XML parser such as XML::Twig or XML::Simple. Given that you're creating a CB history, wouldn't you think it a good idea to include a date/time field in your database? ;) Cheers, Darren :)	[reply]
Re^2: Perl, SQLite3, and Parsing the Chatterbox Feed. by jrsimmon (Hermit) on Feb 14, 2008 at 15:44 UTC
I'd like to second both of these suggestions, as well as add a couple of my own... Have you considered parsing the posts for links in the CB? I imagine it would lend itself to some very interesting correlations down the road: "What percentage of posts link to cpan? Which Monk links to his/her scratchpad most often? etc..." To make this really work well, you would definately need at least a time value as suggested already (and if you plan to keep more than 24 hours worth of data, a date value will be necessary as well).	[reply]
Re: Perl, SQLite3, and Parsing the Chatterbox Feed. by holli (Abbot) on Feb 14, 2008 at 10:27 UTC
working in "production": #!/usr/bin/perl use lib qw( /mnt/web4/10/47/51683347/htdocs/lib/site_perl/5.8.5 ); use warnings; use strict; use DBI; use WWW::Mechanize; use XML::Simple; my ($sth, $dbh, $xml); my $messages = []; my $mech = WWW::Mechanize->new(); while (1) { my $resp = $mech->get( 'http://www.perlmonks.org/index.pl?node_id= +207304' ); if ( $resp->is_success ) { my $xml = $resp->content; my $jatter = XMLin( $xml, ForceArray => ['message'] ); if ( $jatter->{info}->{count} > 0 ) { print STDERR "adding ", scalar @{$jatter->{message}}, "\n" +; unless ( $dbh ) { $dbh = DBI->connect("DBI:mysql:database=DB354211;host= +rdbms.strato.de", 'U354211', 'pw354211'); $sth = $dbh->prepare('INSERT INTO pmf_jatterboxx (user +_id, author, epoch, message_id, message) VALUES (?, ?, ?, ?, ?)'); } for ( @{$jatter->{message}} ) { $sth->execute( $_->{user_id}, $_->{author}, $_->{epoch +}, $_->{message_id}, $_->{text} ); } } else { print STDERR "snooze\n"; } } sleep(5); } [download] note: It is not obvious, but the chatterbox feed somehow notices the caller and returns only the chat-lines that are new; even without passing a date flage or something. I am curious how that works. holli, /regexed monk/	[reply] [d/l]
Re: Perl, SQLite3, and Parsing the Chatterbox Feed. by hipowls (Curate) on Feb 14, 2008 at 06:36 UTC
To get the data into a usable form this works. It needs error checking but the idea is sound. use XML::Simple; use LWP::Simple; use Data::Dumper; my $url = 'http://www.perlmonks.org/?node_id=207304'; my $text = get($url); my $ref = XMLin( $text, ForceArray => ['message'], ); print Dumper $ref; __END__ $VAR1 = { 'info' => { 'sitename' => 'PerlMonks', 'count' => '2', 'gentimeGMT' => '2008-02-14 06:32:17', 'lastid' => '703987', 'content' => 'Rendered by the New Chatterbox XML Ticker', 'xmlmaker' => 'XML::Fling 1.001', 'site' => 'http://perlmonks.org/', 'xmlstyle' => 'clean,new', 'fromid' => '00703985', 'ticker_id' => '207304' }, 'message' => [ { 'message_id' => '703986', 'epoch' => '1202970679', 'text' => 'testing', 'time' => '01:31:19', 'date' => '2008-02-14', 'user_id' => '660179', 'author' => 'hipowls' }, { 'message_id' => '703987', 'epoch' => '1202970708', 'text' => 'just ignore it', 'time' => '01:31:48', 'date' => '2008-02-14', 'user_id' => '660179', 'author' => 'hipowls' } ] }; [download] Update: Added `ForceArray => ['message']` so that messages are always in a list even when there is only one.	[reply] [d/l] [select]
Re: Perl, SQLite3, and Parsing the Chatterbox Feed. by pc88mxer (Vicar) on Feb 14, 2008 at 06:15 UTC
You definitely need to use less greedy regex's. Instead of: `my $pat = qr{ .<author>(.)<\/author>.<text>(.)<\/text }xs;` [download] use: `my $pat = qr{ .?<author>(.?)<\/author>.?<text>(.?)<\/text }xs;` [download] Also, I'm not sure you are using the `/g` option correctly. I've had better luck with: `while ($data =~ m/$pat/gc) { my ($auth, $text) = ($1, $2); for( $text ) { s/[ ]+/ /g; s/^\s+//; s/\s+$//; } printf "%s: %s\n\n" , $auth , $text; }` [download]	[reply] [d/l] [select]
Re^2: Perl, SQLite3, and Parsing the Chatterbox Feed. by ikegami (Patriarch) on Feb 14, 2008 at 18:35 UTC
I don't see why either of you are using `/c`. It's definitely not useful, and I suspect it's harmful.	[reply] [d/l]
Re^3: Perl, SQLite3, and Parsing the Chatterbox Feed. by parv (Parson) on Feb 15, 2008 at 03:00 UTC
I am responsible for `/c` in match condition & simultaneous assignment in while loop (for replied in hurry, misread the /c description). Here is what works ... `# Without /g, it would be an endless loop for match will # always start at the start of $data. while ( $data =~ m/$parse/g ) { my ( $auth , $text ) = ( "$1" , "$2" ); ... }` [download] (Circa 2001-2005, there are some examples of XML::(Twig\|Simple) use to parse the chatterbox XML around here somewhere.)	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom