Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

get data from xpath

by Anonymous Monk
on Jan 21, 2013 at 03:53 UTC ( #1014379=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to get the href of the first youtube video in the search reasults my code so far is:
se LWP::UserAgent; use HTML::TreeBuilder::XPath; use HTML::Selector::XPath; my $ua = LWP::UserAgent->new; my $html = ""; my $tree = HTML::TreeBuilder::XPath->new; my $xpath = HTML::Selector::XPath::selector_to_xpath("(//*[@id = 'sear +ch-results']/li)[1]/div[2]/h3/a/@href/"); my @nodes = $tree->findnodes($xpath);
the goal for this url is to get "/watch?v=JP68g3SYObU" but im having trouble figuring out how to extract that data from the module and xpath.

Replies are listed 'Best First'.
Re: get data from xpath
by tobyink (Abbot) on Jan 21, 2013 at 09:19 UTC

    A few problems:

    • You mention a URL, and you create a UA object, but you never actually use the UA to retrieve any HTML from the URL.

      Anyway, you don't need to use LWP::UserAgent directly because HTML::TreeBuilder::XPath supplies a new_from_url method.

    • You use HTML::Selector::XPath - this module is aimed at turning CSS-style selectors into XPath expressions, but you already have an XPath, so you don't need this module!

    • You're using @id and @href within double quotes - this will be interpreted as two Perl arrays! You want to single quote the XPath.

    • There shouldn't be a slash at the end of the XPath.

    The following works:

    use 5.010; use strict; use warnings; use HTML::TreeBuilder::XPath; my $url = ""; my $tree = HTML::TreeBuilder::XPath->new_from_url($url); my $xpath = '(//*[@id="search-results"]/li)[1]/div[2]/h3/a/@href'; my @nodes = $tree->findnodes($xpath); say $_->getValue for @nodes;
    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
Re: get data from xpath
by Kenosis (Priest) on Jan 21, 2013 at 07:43 UTC

    Perhaps using WWW::Mechanize would be helpful in this case:

    use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); my $url = ''; $mech->get($url); my $firstVideo = $mech->find_link( url_regex => qr!/watch\?v=! ); if ( defined $firstVideo ) { print '' . $firstVideo->url; } else { print "Unable to find link.\n"; }

Re: get data from xpath
by tmharish (Friar) on Jan 21, 2013 at 08:01 UTC

    HTML::Miner will work too:

    use strict ; use warnings ; use HTML::Miner ; use Data::Dump qw(dump) ; my $mine = new HTML::Miner( CURRENT_URL => ' +o+rida', ); my @links = @{ $mine->get_links() } ; my @result_links; foreach my $link ( @links ) { if( $link->{ DOMAIN } =~ '' ) { if( $link->{ ANCHOR } =~ /video-time/ ) { if( $link->{ ABS_URL } =~ /http:\/\/www\.youtube\.com(\/watch\ +?v=.*?)$/ ) { push @result_links, $1 ; } } } } dump( \@result_links ) ;

    Disclaimer: I maintain that module.

Re: get data from xpath
by Anonymous Monk on Jan 21, 2013 at 07:03 UTC
Re: get data from xpath
by harimetkari (Initiate) on Jan 21, 2013 at 06:46 UTC
    Please check perl fuction of man pages.It will work
      What function? What will work?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1014379]
Approved by bitingduck
Front-paged by Lotus1
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2017-12-16 03:52 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (448 votes). Check out past polls.