Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

RSS of Twitter search results after 11 June 2013

by ciderpunx (Vicar)
on Jun 17, 2013 at 14:41 UTC ( #1039382=CUFP: print w/ replies, xml ) Need Help??

Twitter recently got rid of the ability to get search results as an RSS as part of their API update of 11 June 2013.

I found those feeds rather useful, so I made a little screen scraper that reimplements the functionality without needing to auth against their API (it just pulls the results out of the web search page). I guess this will be good for a while longer, like enough time to switch to statusnet, identica, or whatever.

It might be of use to some others in the monastry and illustrates the power of HTML::TreeBuilder::XPath.

#!/usr/bin/perl use strict; use warnings; use utf8; use 5.10.0; use Data::Dumper; use Readonly; use HTML::TreeBuilder::XPath; use LWP::Simple; use POSIX qw(strftime); binmode STDOUT, 'utf8'; Readonly my $BASEURL => 'https://twitter.com'; Readonly my $USAGE => "$0 <search_term>: make an rss of a twitter se +arch"; die $USAGE unless $#ARGV==0; my $term = $ARGV[0]; my $content = get("$BASEURL/search?q=$term&src=typd"); die "Couldn't get search results" unless defined $content; my @items; my $tree= HTML::TreeBuilder::XPath->new; $tree->parse($content); my $tweets = $tree->findnodes( '//li' . class_contains('js-stream-item +') ); for my $li (@$tweets) { my $tweet = $li->findnodes('./div' . class_contains("tweet") . '/div' . class_contains("content") )->[0] ; my $header = $tweet->findnodes('./div' . class_contains("stream-item +-header"))->[0]; my $body = $tweet->findvalue('./p' . class_contains("tweet-text")) +; $body = "<![CDATA[$body]]>"; my $avatar = $header->findvalue('./a/img' . class_contains("avatar") + . "/\@src"); my $fullname = $header->findvalue('./a/strong' . class_contains("ful +lname")); my $username = '@' . $header->findvalue('./a/span' . class_contains( +"username") . '/b'); my $uri = $BASEURL . $header->findvalue('./small' . class_contains("time") . '/a' . class_contains("tweet-timestamp") . '/@href' ); my $timestamp = $header->findvalue('./small' . class_contains("time") . '/a' . class_contains("tweet-timestamp") . '/span/@data-time' ); my $pub_date = strftime("%a, %d %b %Y %H:%M:%S %z", localtime($times +tamp)); push @items, { username => $username, fullname => $fullname, link => $uri, guid => $uri, title => $body, description => $body, timestamp => $timestamp, pubDate => $pub_date } } $tree->delete; # now print as an rss feed print<<ENDHEAD <?xml version="1.0" encoding="UTF-8"?> <rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:georss="http://www +.georss.org/georss" xmlns:twitter="http://api.twitter.com" version="2 +.0"> <channel> <title>Twitter Search / $term </title> <link>http://twitter.com/search/q=$term</link> <description>Twitter search for: $term.</description> <language>en-us</language> <ttl>40</ttl> ENDHEAD ; for (@items) { print<<ENDITEM <item> <title>$_->{username}: $_->{title}</title> <description>$_->{description}</description> <pubDate>$_->{pubDate}</pubDate> <guid>$_->{guid}</guid> <link>$_->{link}</link> <twitter:source/> <twitter:place/> </item> ENDITEM ; } print<<ENDRSS </channel> </rss> ENDRSS ; sub class_contains { my $classname = shift; "[contains(concat(' ',normalize-space(\@class),' '),' $classname ')] +"; }



Comment on RSS of Twitter search results after 11 June 2013
Download Code
Re: RSS of Twitter search results after 11 June 2013
by ww (Bishop) on Jun 17, 2013 at 15:36 UTC
    I'm sure the crowd at PRISM will appreciate this code.

    :-)


    Abandon all privacy, ye who enter here.

      nah .. they probably already have a direct firehose feed :-)
Re: RSS of Twitter search results after 11 June 2013
by wumpus (Sexton) on May 30, 2014 at 04:22 UTC
    Twitter just changed their feed format... $tweet is getting undef. I looked and I don't see how to fix it.
      ... and it's working again. Never mind.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://1039382]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2014-10-02 11:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (54 votes), past polls