Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

News alerts using the BBC's news ticker data file

by SuperCruncher (Pilgrim)
on Mar 22, 2003 at 22:58 UTC ( #245224=CUFP: print w/ replies, xml ) Need Help??

In light of current world events, I think it's particularly important to keep up to date with the latest news. One of my favourite sources of news is the excellent BBC News web site. A few months ago, I noticed that they offered a desktop news ticker application. I didn't particularly like the program, but I thought it would be interesting to access its data stream, as the headlines were short and concise, and a wide range of information was available (world news, UK news, sports, weather, business, sci-tech and so on). My initial strategy was to use strings on the ticker executable, but unfortunately that didn't get me any good results. But then I had a flash of inspiration: enter HTTP::Proxy by PerlMonks' own BooK. With just one line of code I was able to set up an HTTP proxy server on my system. I set my proxy in the ticker application to localhost, and bingo, the URL of the ticker data was revealed: http://tickers.bbc.co.uk/tickerdata/story2.dat.

I downloaded this file and was presently surprised by its simple, text based format. No XML, nothing like that. Check out the file yourself to see its simple format. So, I set about creating a Perl script to parse it. It was a bit more difficult than I had expected, but I got it done.

After getting the parsing working, my main aim was to create a program that could download the file, parse it, extract the main world news headlines and send them to my mobile phone as a text message using SMS. This program would then be run via cron. It worked great - and now I can get the latest news when out and about, even when I'm in the cinema. Some might ask why I didn't use WAP, but well, my phone (Nokia 3210) doesn't support it, and I'm not ready to upgrade. Part of the reason for posting this CUFP is that you get the code and can use it for whatever you want (SMS alerts, WAP, Palm, whatever).

So, if this interests you, feel free to check out the parsing code (along with a simple example).

Another reason why I'm posting this "cool use for Perl" is that I want to get some feedback on whether or not it's worth putting this on CPAN as a module. Any comments would be appreciated.

Comment on News alerts using the BBC's news ticker data file
Replies are listed 'Best First'.
Re: News alerts using the BBC's news ticker data file
by zentara (Archbishop) on Mar 23, 2003 at 17:02 UTC
    Well here is a simple way to put it into a Tk marquee.

    UPDATESet timer to update every 5 minutes.

    #!/usr/bin/perl #click on marquee to remove use strict; use LWP::Simple; use Tk; my $text; my $mw = tkinit; $mw->geometry('+20+20'); $mw->overrideredirect(1); my $label = $mw->Label( -textvariable=>\$text, -font=>'courier', -bg=>'green', -bd=>4, -relief=>'ridge' )->pack(-fill=>'both'); getnews(); $mw->repeat(300000,\&getnews); $label->bind('<ButtonRelease>',sub{$mw->destroy}); $mw->repeat(160,[sub{$text=~s/(.)(.*)/$2$1/;}]); MainLoop; ##################################################### sub getnews{ my %ticker = parse_ticker_data( 'http://tickers.bbc.co.uk/tickerdata/story2.dat' ); my @stories; foreach my $story (@{ $ticker{'WORLD'} }) { push @stories, $story->{headline} } $text = join ' - ', @stories; $label->update; return; } ###################################################################### +### sub parse_ticker_data { my $ticker_data_url = $_[0]; my (%ticker, $current_category, @stories_in_this_cat, $last_category +, $last_token, $headline, $url); die "No ticker URL supplied to parse_ticker_data()" if (!$ticker_dat +a_url); # Download the ticker data file from the BBC my $ticker_data = get $ticker_data_url; die "Couldn't retrive ticker data" if (!$ticker_data); #for offline testing #open (FILE,"<story2.dat") or die "Couldn't open: $!"; #my $ticker_data = do {local $/; <FILE>}; #close FILE; # Examine each line in the ticker data file foreach (split /\n/, $ticker_data) { if (/Last update at (\d\d:\d\d)/) { # Extract last updated time $ticker{update_time} = $1 . ' GMT'; $headline = undef; next; } # Set the current category if (/HEADLINE\s (SPORTS|BUSINESS|WORLD|UK|SCI-TECH|TRAVEL|WEATHER|FINANCE)/x) { $current_category = $1; $last_token = 'new'; $headline = undef; next; } my ($token, @data) = split /\s/, $_; my $data = join ' ', @data; # Have we changed categories? If so, then we need to store all the + # stories in the old category in the data structure if ($current_category ne $last_category) { if (@stories_in_this_cat) { my @narrow_scoped = @stories_in_this_cat; $ticker{$last_category} = \@narrow_scoped; @stories_in_this_cat = (); } } if ( ($token eq 'HEADLINE') and ($last_token eq 'STORY') ) { # Starting to parse a new headline. Need to put the old one onto + # the array though. if ($headline) { # don't want headlines like "UK News", see ab +ove push @stories_in_this_cat, { headline => $headline, url => $url }; } $headline = $data; } # Set the URL if ($token eq 'URL') { $url = $data; } # Update for next iteration $last_token = $token; $last_category = $current_category; } # The last category won't be added in the loop, so need to do it # manually. This is nasty, so any improvements would be welcome :-) + $ticker{$current_category} = \@stories_in_this_cat; return %ticker; } ##################################################################
Re: News alerts using the BBC's news ticker data file
by vek (Prior) on Mar 23, 2003 at 04:12 UTC
    I dunno SuperCruncher, personally I like what you've done here. Go ahead and upload it to the CPAN.

    -- vek --
      ...but make it run spotlessly under warnings.
      I think all you need to do is initialize these two variables:
      my $current_category = ''; my $last_category = '';

      jdporter
      The 6th Rule of Perl Club is -- There is no Rule #6.

Re: News alerts using the BBC's news ticker data file
by Anonymous Monk on Mar 23, 2003 at 22:24 UTC
    is that I want to get some feedback on whether or not it's worth putting this on CPAN as a module

    I assure you there is code of far lower quality on CPAN. Not everything you place there has to be perfect. In fact worrying about it is detrimental to Perl development. Instead, post it and take care of your code - accept patches, actively maintain it (if you can't, pass it off to someone). Code you initially post does not have to be perfect, that's not how open source development works. I assure you the linux kernel wasn't perfect the first day code was made perfect, and it isn't perfect now either.

    So yeah, post it, don't worry about the neurotic perfectionists, they just haven't taken their prozak yet.

      Oops:

      I assure you the linux kernel wasn't perfect the first day code was made perfect (public not perfect, freudian slip, heh).

Re: News alerts using the BBC's news ticker data file
by SuperCruncher (Pilgrim) on Mar 23, 2003 at 23:05 UTC
    Thanks for all your replies. It's great to see people already using the code for their own applications. Hopefully we'll see some follow-up CUFPs now :-)

    I'm encouraged by the response, and I think I will put the code on CPAN. "Published and be damned" as the saying goes :-) A scrolling marquee is interesting - I might have to try something similar in wxPerl.

    As a little aside, I was working on similar (but more complicated code) for the SkySports scorelines soccer results applet data file. If you're into your soccer, then this is for you: it's very comprehensive. It includes each team's starting line ups, substitutes, name of referee, match attendance, goal scorers, missed penalties, bookings and all the rest of it. It's far more complicated (had to use Mocha to decompile the Java code), and eventually I lost interest. I did document what I've discovered about the file format though (quite a bit too), so if anyone wants to work together or work separately on it, feel free to /msg me.

Re: News alerts using the BBC's news ticker data file
by dbush (Deacon) on Mar 24, 2003 at 15:22 UTC

    Hi,

    I thought I would brush up my rusty Parse::RecDescent skills (to be honest they were pretty non-existent to begin with) to see if I could parse the file in another way. Also, with the help of Mr. Muskrat's node Read "We're Going on a Bear Hunt" Out Loud, I thought I would create a virtual Brian Perkins. It is almost as good as the real one but with a slight American accent.

    Regards,
    Dom.

    Updates:

    • Forgot to mention that the speech is Windows only.
    • Corrected typo.
    • Also forgot to mention that the data structure is different from the original one used by SuperCruncher. The parser returns a pointer to a hash (with the section name as the key) of hashes (key is the headline with the URL as the value). This assumes that the headlines themselves are unique within a section. If they aren't, the URL will be over written but I assumed this would be unlikely.
    • Corrected date parsing as per bfdi533's suggestion.
    • Changed {unless $item[1] ne ''} to {if $item[1] eq ''}. Reads better.
    • Changed time parsing. Instead of +, used {2}.

      Tried this out yesterday and thought it was the bomb! Great job.
      But I tried it out today and found that it returned nothing but the categories. Checking the grammar I found that the DATE specification failes on dates like "1 April" as it is only a 1-digit day, rather than as expected in the grammar as "01 April".
      If the DATE is changed to "DATE: /0-9+ A-Za-z+ 0-9{4}/" then it works again.

      Ed

        Many thanks bfdi533. Just goes to show how rusty my grammar writing skills were/are. I've made the correction to the original node.

        Regards,
        Dom.

        PS: You may be wondering where your [ and ] characters have gone and why have strange hyperlinks appeared? The answer is that those characters are used to create links within the site. This link, although outdated, has more information.

Re: News alerts using the BBC's news ticker data file
by zakzebrowski (Curate) on Mar 24, 2003 at 16:03 UTC
    FYI, check out the pda version of bbcnews, which is text only.

    ----
    Zak
    Pluralitas non est ponenda sine neccesitate - mysql's philosphy
Re: News alerts using the BBC's news ticker data file
by BooK (Curate) on Mar 24, 2003 at 10:44 UTC

    Wow, glad to see that my code can actually be useful... :-)

    Note that one of my goals with HTTP::Proxy is to write a set of filters than can record a web session and automatically create a web robot that (with minimal modification) can replay any web session.

Re: News alerts using the BBC's news ticker data file
by chanio (Priest) on Mar 24, 2003 at 03:44 UTC
    Hi, I am new at all these! Let me introduce myself. I have been programming for years in other languajes. But I have now discovered that Perl really seduces me. And I learned a lot with your script. What I am not sure to understand is this: ┐Are you able to modify the .dat file that updates at your local proxy? If so, you could change the contents, and get news from everywhere in the WWW inside your BBC news ticker! Not that it is so difficult to build a newsticker, but you could offer your idea to some sites to do the same in a very cheap way. There is no big deal to read .RSS news and put them in your BBC ticker via your local proxy. But it happends that I know nothing of proxies and am not sure if the format is easy to understand in order to modify it's content. Thank you for your article! Alberto

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://245224]
Front-paged by gmax
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2016-05-24 22:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?