Beefy Boxes and Bandwidth Generously Provided by pair Networks chromatic writing perl on a camel
Problems? Is your data what you think it is?
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Twitter recently got rid of the ability to get search results as an RSS as part of their API update of 11 June 2013.

I found those feeds rather useful, so I made a little screen scraper that reimplements the functionality without needing to auth against their API (it just pulls the results out of the web search page). I guess this will be good for a while longer, like enough time to switch to statusnet, identica, or whatever.

It might be of use to some others in the monastry and illustrates the power of HTML::TreeBuilder::XPath.

#!/usr/bin/perl use strict; use warnings; use utf8; use 5.10.0; use Data::Dumper; use Readonly; use HTML::TreeBuilder::XPath; use LWP::Simple; use POSIX qw(strftime); binmode STDOUT, 'utf8'; Readonly my $BASEURL => 'https://twitter.com'; Readonly my $USAGE => "$0 <search_term>: make an rss of a twitter se +arch"; die $USAGE unless $#ARGV==0; my $term = $ARGV[0]; my $content = get("$BASEURL/search?q=$term&src=typd"); die "Couldn't get search results" unless defined $content; my @items; my $tree= HTML::TreeBuilder::XPath->new; $tree->parse($content); my $tweets = $tree->findnodes( '//li' . class_contains('js-stream-item +') ); for my $li (@$tweets) { my $tweet = $li->findnodes('./div' . class_contains("tweet") . '/div' . class_contains("content") )->[0] ; my $header = $tweet->findnodes('./div' . class_contains("stream-item +-header"))->[0]; my $body = $tweet->findvalue('./p' . class_contains("tweet-text")) +; $body = "<![CDATA[$body]]>"; my $avatar = $header->findvalue('./a/img' . class_contains("avatar") + . "/\@src"); my $fullname = $header->findvalue('./a/strong' . class_contains("ful +lname")); my $username = '@' . $header->findvalue('./a/span' . class_contains( +"username") . '/b'); my $uri = $BASEURL . $header->findvalue('./small' . class_contains("time") . '/a' . class_contains("tweet-timestamp") . '/@href' ); my $timestamp = $header->findvalue('./small' . class_contains("time") . '/a' . class_contains("tweet-timestamp") . '/span/@data-time' ); my $pub_date = strftime("%a, %d %b %Y %H:%M:%S %z", localtime($times +tamp)); push @items, { username => $username, fullname => $fullname, link => $uri, guid => $uri, title => $body, description => $body, timestamp => $timestamp, pubDate => $pub_date } } $tree->delete; # now print as an rss feed print<<ENDHEAD <?xml version="1.0" encoding="UTF-8"?> <rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:georss="http://www +.georss.org/georss" xmlns:twitter="http://api.twitter.com" version="2 +.0"> <channel> <title>Twitter Search / $term </title> <link>http://twitter.com/search/q=$term</link> <description>Twitter search for: $term.</description> <language>en-us</language> <ttl>40</ttl> ENDHEAD ; for (@items) { print<<ENDITEM <item> <title>$_->{username}: $_->{title}</title> <description>$_->{description}</description> <pubDate>$_->{pubDate}</pubDate> <guid>$_->{guid}</guid> <link>$_->{link}</link> <twitter:source/> <twitter:place/> </item> ENDITEM ; } print<<ENDRSS </channel> </rss> ENDRSS ; sub class_contains { my $classname = shift; "[contains(concat(' ',normalize-space(\@class),' '),' $classname ')] +"; }




In reply to RSS of Twitter search results after 11 June 2013 by ciderpunx

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others scrutinizing the Monastery: (3)
    As of 2014-04-20 11:32 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      April first is:







      Results (485 votes), past polls