Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

get data from xpath

by Anonymous Monk
on Jan 21, 2013 at 03:53 UTC ( #1014379=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to get the href of the first youtube video in the search reasults my code so far is:
se LWP::UserAgent; use HTML::TreeBuilder::XPath; use HTML::Selector::XPath; my $ua = LWP::UserAgent->new; my $html = "http://www.youtube.com/results?search_query=run+flo+rida"; my $tree = HTML::TreeBuilder::XPath->new; my $xpath = HTML::Selector::XPath::selector_to_xpath("(//*[@id = 'sear +ch-results']/li)[1]/div[2]/h3/a/@href/"); my @nodes = $tree->findnodes($xpath);
the goal for this url is to get "/watch?v=JP68g3SYObU" but im having trouble figuring out how to extract that data from the module and xpath.

Comment on get data from xpath
Download Code
Re: get data from xpath
by harimetkari (Initiate) on Jan 21, 2013 at 06:46 UTC
    Please check perl fuction of man pages.It will work
      What function? What will work?
Re: get data from xpath
by Anonymous Monk on Jan 21, 2013 at 07:03 UTC
Re: get data from xpath
by Kenosis (Priest) on Jan 21, 2013 at 07:43 UTC

    Perhaps using WWW::Mechanize would be helpful in this case:

    use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); my $url = 'http://www.youtube.com/results?search_query=run+flo+rida'; $mech->get($url); my $firstVideo = $mech->find_link( url_regex => qr!/watch\?v=! ); if ( defined $firstVideo ) { print 'http://www.youtube.com' . $firstVideo->url; } else { print "Unable to find link.\n"; }

    Output:

    http://www.youtube.com/watch?v=JP68g3SYObU
Re: get data from xpath
by tmharish (Friar) on Jan 21, 2013 at 08:01 UTC

    HTML::Miner will work too:

    use strict ; use warnings ; use HTML::Miner ; use Data::Dump qw(dump) ; my $mine = new HTML::Miner( CURRENT_URL => 'http://www.youtube.com/results?search_query=run+fl +o+rida', ); my @links = @{ $mine->get_links() } ; my @result_links; foreach my $link ( @links ) { if( $link->{ DOMAIN } =~ 'www.youtube.com' ) { if( $link->{ ANCHOR } =~ /video-time/ ) { if( $link->{ ABS_URL } =~ /http:\/\/www\.youtube\.com(\/watch\ +?v=.*?)$/ ) { push @result_links, $1 ; } } } } dump( \@result_links ) ;

    Disclaimer: I maintain that module.

Re: get data from xpath
by tobyink (Abbot) on Jan 21, 2013 at 09:19 UTC

    A few problems:

    • You mention a URL, and you create a UA object, but you never actually use the UA to retrieve any HTML from the URL.

      Anyway, you don't need to use LWP::UserAgent directly because HTML::TreeBuilder::XPath supplies a new_from_url method.

    • You use HTML::Selector::XPath - this module is aimed at turning CSS-style selectors into XPath expressions, but you already have an XPath, so you don't need this module!

    • You're using @id and @href within double quotes - this will be interpreted as two Perl arrays! You want to single quote the XPath.

    • There shouldn't be a slash at the end of the XPath.

    The following works:

    use 5.010; use strict; use warnings; use HTML::TreeBuilder::XPath; my $url = "http://www.youtube.com/results?search_query=run+flo+rida"; my $tree = HTML::TreeBuilder::XPath->new_from_url($url); my $xpath = '(//*[@id="search-results"]/li)[1]/div[2]/h3/a/@href'; my @nodes = $tree->findnodes($xpath); say $_->getValue for @nodes;
    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1014379]
Approved by bitingduck
Front-paged by Lotus1
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (11)
As of 2014-09-22 08:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (184 votes), past polls