RSS of Twitter search results after 11 June 2013

by ciderpunx (Vicar)
on Jun 17, 2013 at 14:41 UTC ( #1039382=CUFP: print w/ replies, xml ) Need Help??

Twitter recently got rid of the ability to get search results as an RSS as part of their API update of 11 June 2013.

I found those feeds rather useful, so I made a little screen scraper that reimplements the functionality without needing to auth against their API (it just pulls the results out of the web search page). I guess this will be good for a while longer, like enough time to switch to statusnet, identica, or whatever.

It might be of use to some others in the monastry and illustrates the power of HTML::TreeBuilder::XPath.

#!/usr/bin/perl use strict; use warnings; use utf8; use 5.10.0; use Data::Dumper; use Readonly; use HTML::TreeBuilder::XPath; use LWP::Simple; use POSIX qw(strftime); binmode STDOUT, 'utf8'; Readonly my $BASEURL => ''; Readonly my $USAGE => "$0 <search_term>: make an rss of a twitter se +arch"; die $USAGE unless $#ARGV==0; my $term = $ARGV[0]; my $content = get("$BASEURL/search?q=$term&src=typd"); die "Couldn't get search results" unless defined $content; my @items; my $tree= HTML::TreeBuilder::XPath->new; $tree->parse($content); my $tweets = $tree->findnodes( '//li' . class_contains('js-stream-item +') ); for my $li (@$tweets) { my $tweet = $li->findnodes('./div' . class_contains("tweet") . '/div' . class_contains("content") )->[0] ; my $header = $tweet->findnodes('./div' . class_contains("stream-item +-header"))->[0]; my $body = $tweet->findvalue('./p' . class_contains("tweet-text")) +; $body = "<![CDATA[$body]]>"; my $avatar = $header->findvalue('./a/img' . class_contains("avatar") + . "/\@src"); my $fullname = $header->findvalue('./a/strong' . class_contains("ful +lname")); my $username = '@' . $header->findvalue('./a/span' . class_contains( +"username") . '/b'); my $uri = $BASEURL . $header->findvalue('./small' . class_contains("time") . '/a' . class_contains("tweet-timestamp") . '/@href' ); my $timestamp = $header->findvalue('./small' . class_contains("time") . '/a' . class_contains("tweet-timestamp") . '/span/@data-time' ); my $pub_date = strftime("%a, %d %b %Y %H:%M:%S %z", localtime($times +tamp)); push @items, { username => $username, fullname => $fullname, link => $uri, guid => $uri, title => $body, description => $body, timestamp => $timestamp, pubDate => $pub_date } } $tree->delete; # now print as an rss feed print<<ENDHEAD <?xml version="1.0" encoding="UTF-8"?> <rss xmlns:atom="" xmlns:georss="http://www" xmlns:twitter="" version="2 +.0"> <channel> <title>Twitter Search / $term </title> <link>$term</link> <description>Twitter search for: $term.</description> <language>en-us</language> <ttl>40</ttl> ENDHEAD ; for (@items) { print<<ENDITEM <item> <title>$_->{username}: $_->{title}</title> <description>$_->{description}</description> <pubDate>$_->{pubDate}</pubDate> <guid>$_->{guid}</guid> <link>$_->{link}</link> <twitter:source/> <twitter:place/> </item> ENDITEM ; } print<<ENDRSS </channel> </rss> ENDRSS ; sub class_contains { my $classname = shift; "[contains(concat(' ',normalize-space(\@class),' '),' $classname ')] +"; }

Re: RSS of Twitter search results after 11 June 2013
by ww (Bishop) on Jun 17, 2013 at 15:36 UTC
    I'm sure the crowd at PRISM will appreciate this code.


    Abandon all privacy, ye who enter here.

      nah .. they probably already have a direct firehose feed :-)

