Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

RE: Get chatbox lines

by swiftone (Curate)
on May 26, 2000 at 20:35 UTC ( [id://15026]=note: print w/replies, xml ) Need Help??


in reply to Get chatbox lines

You shame me. I'll have to pour over it before I make a more detailed comment, but here's my version of a similar thing. I didn't make it into a package, and it still has a few problems.

I just have it run in a window in the background so that I can lookup anything I miss when I don't reload fast enough. Comments appreciated!

#!/usr/bin/perl -w use strict; use LWP::Simple; my($newmessages, $oldmessages); while(1){ get("http://www.perlmonks.org/")=~/<!-- nodelets start here -- +>(.*)/s; $_=$1; my(@nodelet)=split(/<!--Nodelet Break -->/); foreach (@nodelet){ if (/Chatterbox/){ s/\n//g; s/\r//g; while(m%(<b>&lt;</b>|<i>)<a href=[^>]*>([^<]*) +</a>(<b>&gt;</b>)?(.*?)(</i>)?<br>%ig){ if(!defined($oldmessages->{"$2:$4"})){ $newmessages->{"$2:$4"}=1; print "$2: $4\n"; } } last; } } undef $oldmessages; $oldmessages=$newmessages; undef $newmessages; sleep(15); }

Replies are listed 'Best First'.
RE: RE: Get chatbox lines
by swiftone (Curate) on May 26, 2000 at 20:46 UTC
    I've noticed it gets funkier and funkier as it runs. I suspect there is something wrong with the section where I check to make sure it isn't a repeat message.

    Concept: I snag the poster and comment from the line, then check to see if that key exists in the hash referenced by $oldmessages (basically using the hash as an easy lookup). If it isn't there, I drop it into $newmessages and print it out. Once I've gone through the page, I undef $oldmessages, and recreate it to point at $newmessages. Are there problems with this?

    And yes, I realize that any nodelet that mentions "Chatterbox" will screw this program. Not sure how to fix that without break theme independance.

RE: RE: Get chatbox lines
by ZZamboni (Curate) on May 26, 2000 at 21:30 UTC
    I like the way you get the chatterbox, by breaking into the nodelet units. I may do that instead of the match on the whole page I do right now.

    I think the problem with your cache is that you are resetting old messages every time. Here's what happens. Assume the first time through the chatbox has the following lines:

    user1: blah user2: bleh user3: blih
    So you add those to $newmessage, which then becomes $oldmessages. The next time, the box contains:
    user1: blah user2: bleh user3: blih user1: howdy
    As you go through the lines, the only one that gets added to $newmessages is the last one, because the others are already in $oldmessages.

    Then, and here is the problem, you remove $oldmessages! So the only message you have a record of is the last one. So the next time through, the first three messages are printed again, because they are no longer in the cache.

    I don't think you need the juggling of $old and $newmessages. You can just keep one hash where you cache all the messages. The problem with this (and the reason why I didn't do it that way in my code) is that you have no way of knowing which messages are older or newer, so unless you attach a timestamp to each entry, your cache will grow indefinitely. Furthermore, if the same user says the same thing in two different occasions, the second time through it will not be printed because your program will think it's a repeated message.

    Hope this helps,

    --ZZamboni

      As you go through the lines, the only one that gets added to $newmessages is the last one, because the others are already in $oldmessages.

      Then, and here is the problem, you remove $oldmessages! So the only message you have a record of is the last one. So the next time through, the first three messages are printed again, because they are no longer in the cache.

      Good catch on the lost messages. Rather than your suggestion, however, I simply moved the line that places the message into the hash outside the if(!defined()) loop.

      This means that only the messages that showed up are cached...the page should never return old messages, and my cache remains a usable size (actually, much smaller than yours! :) )

      As for the repeat messages thing, that's a feature. Quite often repeated messages ARE repeats. With a limited cache, non-accidental repeat collisions should be rare.

      Thanks for the help, it's much cleaner now with that one little change. Still not nicely packaged for an interface layer like yours is, but I admit to a small attachment to code I have written.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://15026]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2024-03-28 17:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found