Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
Your skill will accomplish
what the force of many cannot

RSS of Twitter search results after 11 June 2013

by ciderpunx (Vicar)
on Jun 17, 2013 at 14:41 UTC ( #1039382=CUFP: print w/ replies, xml ) Need Help??

Twitter recently got rid of the ability to get search results as an RSS as part of their API update of 11 June 2013.

I found those feeds rather useful, so I made a little screen scraper that reimplements the functionality without needing to auth against their API (it just pulls the results out of the web search page). I guess this will be good for a while longer, like enough time to switch to statusnet, identica, or whatever.

It might be of use to some others in the monastry and illustrates the power of HTML::TreeBuilder::XPath.

#!/usr/bin/perl use strict; use warnings; use utf8; use 5.10.0; use Data::Dumper; use Readonly; use HTML::TreeBuilder::XPath; use LWP::Simple; use POSIX qw(strftime); binmode STDOUT, 'utf8'; Readonly my $BASEURL => ''; Readonly my $USAGE => "$0 <search_term>: make an rss of a twitter se +arch"; die $USAGE unless $#ARGV==0; my $term = $ARGV[0]; my $content = get("$BASEURL/search?q=$term&src=typd"); die "Couldn't get search results" unless defined $content; my @items; my $tree= HTML::TreeBuilder::XPath->new; $tree->parse($content); my $tweets = $tree->findnodes( '//li' . class_contains('js-stream-item +') ); for my $li (@$tweets) { my $tweet = $li->findnodes('./div' . class_contains("tweet") . '/div' . class_contains("content") )->[0] ; my $header = $tweet->findnodes('./div' . class_contains("stream-item +-header"))->[0]; my $body = $tweet->findvalue('./p' . class_contains("tweet-text")) +; $body = "<![CDATA[$body]]>"; my $avatar = $header->findvalue('./a/img' . class_contains("avatar") + . "/\@src"); my $fullname = $header->findvalue('./a/strong' . class_contains("ful +lname")); my $username = '@' . $header->findvalue('./a/span' . class_contains( +"username") . '/b'); my $uri = $BASEURL . $header->findvalue('./small' . class_contains("time") . '/a' . class_contains("tweet-timestamp") . '/@href' ); my $timestamp = $header->findvalue('./small' . class_contains("time") . '/a' . class_contains("tweet-timestamp") . '/span/@data-time' ); my $pub_date = strftime("%a, %d %b %Y %H:%M:%S %z", localtime($times +tamp)); push @items, { username => $username, fullname => $fullname, link => $uri, guid => $uri, title => $body, description => $body, timestamp => $timestamp, pubDate => $pub_date } } $tree->delete; # now print as an rss feed print<<ENDHEAD <?xml version="1.0" encoding="UTF-8"?> <rss xmlns:atom="" xmlns:georss="http://www" xmlns:twitter="" version="2 +.0"> <channel> <title>Twitter Search / $term </title> <link>$term</link> <description>Twitter search for: $term.</description> <language>en-us</language> <ttl>40</ttl> ENDHEAD ; for (@items) { print<<ENDITEM <item> <title>$_->{username}: $_->{title}</title> <description>$_->{description}</description> <pubDate>$_->{pubDate}</pubDate> <guid>$_->{guid}</guid> <link>$_->{link}</link> <twitter:source/> <twitter:place/> </item> ENDITEM ; } print<<ENDRSS </channel> </rss> ENDRSS ; sub class_contains { my $classname = shift; "[contains(concat(' ',normalize-space(\@class),' '),' $classname ')] +"; }

Comment on RSS of Twitter search results after 11 June 2013
Download Code
Re: RSS of Twitter search results after 11 June 2013
by ww (Bishop) on Jun 17, 2013 at 15:36 UTC
    I'm sure the crowd at PRISM will appreciate this code.


    Abandon all privacy, ye who enter here.

      nah .. they probably already have a direct firehose feed :-)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://1039382]
Front-paged by Arunbear
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (15)
As of 2014-04-16 08:13 GMT
Find Nodes?
    Voting Booth?

    April first is:

    Results (419 votes), past polls