Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
Do you know where your variables are?
 
PerlMonks  

get data from xpath

by Anonymous Monk
on Jan 21, 2013 at 03:53 UTC ( #1014379=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to get the href of the first youtube video in the search reasults my code so far is:
se LWP::UserAgent; use HTML::TreeBuilder::XPath; use HTML::Selector::XPath; my $ua = LWP::UserAgent->new; my $html = "http://www.youtube.com/results?search_query=run+flo+rida"; my $tree = HTML::TreeBuilder::XPath->new; my $xpath = HTML::Selector::XPath::selector_to_xpath("(//*[@id = 'sear +ch-results']/li)[1]/div[2]/h3/a/@href/"); my @nodes = $tree->findnodes($xpath);
the goal for this url is to get "/watch?v=JP68g3SYObU" but im having trouble figuring out how to extract that data from the module and xpath.

Comment on get data from xpath
Download Code
Re: get data from xpath
by harimetkari (Initiate) on Jan 21, 2013 at 06:46 UTC
    Please check perl fuction of man pages.It will work
      What function? What will work?
Re: get data from xpath
by Anonymous Monk on Jan 21, 2013 at 07:03 UTC
Re: get data from xpath
by Kenosis (Priest) on Jan 21, 2013 at 07:43 UTC

    Perhaps using WWW::Mechanize would be helpful in this case:

    use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); my $url = 'http://www.youtube.com/results?search_query=run+flo+rida'; $mech->get($url); my $firstVideo = $mech->find_link( url_regex => qr!/watch\?v=! ); if ( defined $firstVideo ) { print 'http://www.youtube.com' . $firstVideo->url; } else { print "Unable to find link.\n"; }

    Output:

    http://www.youtube.com/watch?v=JP68g3SYObU
Re: get data from xpath
by tmharish (Friar) on Jan 21, 2013 at 08:01 UTC

    HTML::Miner will work too:

    use strict ; use warnings ; use HTML::Miner ; use Data::Dump qw(dump) ; my $mine = new HTML::Miner( CURRENT_URL => 'http://www.youtube.com/results?search_query=run+fl +o+rida', ); my @links = @{ $mine->get_links() } ; my @result_links; foreach my $link ( @links ) { if( $link->{ DOMAIN } =~ 'www.youtube.com' ) { if( $link->{ ANCHOR } =~ /video-time/ ) { if( $link->{ ABS_URL } =~ /http:\/\/www\.youtube\.com(\/watch\ +?v=.*?)$/ ) { push @result_links, $1 ; } } } } dump( \@result_links ) ;

    Disclaimer: I maintain that module.

Re: get data from xpath
by tobyink (Abbot) on Jan 21, 2013 at 09:19 UTC

    A few problems:

    • You mention a URL, and you create a UA object, but you never actually use the UA to retrieve any HTML from the URL.

      Anyway, you don't need to use LWP::UserAgent directly because HTML::TreeBuilder::XPath supplies a new_from_url method.

    • You use HTML::Selector::XPath - this module is aimed at turning CSS-style selectors into XPath expressions, but you already have an XPath, so you don't need this module!

    • You're using @id and @href within double quotes - this will be interpreted as two Perl arrays! You want to single quote the XPath.

    • There shouldn't be a slash at the end of the XPath.

    The following works:

    use 5.010; use strict; use warnings; use HTML::TreeBuilder::XPath; my $url = "http://www.youtube.com/results?search_query=run+flo+rida"; my $tree = HTML::TreeBuilder::XPath->new_from_url($url); my $xpath = '(//*[@id="search-results"]/li)[1]/div[2]/h3/a/@href'; my @nodes = $tree->findnodes($xpath); say $_->getValue for @nodes;
    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1014379]
Approved by bitingduck
Front-paged by Lotus1
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (14)
As of 2014-04-16 20:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (433 votes), past polls