Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
We don't bite newbies here... much
 
PerlMonks  

Help with xpath and TreeBuilder

by danny0085 (Sexton)
on Jun 29, 2012 at 06:01 UTC ( #979047=perlquestion: print w/ replies, xml ) Need Help??
danny0085 has asked for the wisdom of the Perl Monks concerning the following question:

I need to extract links from a html file (not all). Here are the xpaths

/html/body/p/a /html/body/p[2]/a /html/body/p[3]/a .... /html/body/p[66]/a

how can extract the links with TreeBuilder?

here is what I have I do not know how to use the findvalues function

$mech->get($url); $content = $mech->response()->decoded_content(); my $tree = HTML::TreeBuilder::XPath->new; $tree->parse($content); $tree->eof; @res = $tree->findvalues(???);#

Comment on Help with xpath and TreeBuilder
Select or Download Code
Re: Help with xpath and TreeBuilder
by Anonymous Monk on Jun 29, 2012 at 06:19 UTC

    here is what I have I do not know how to use the findvalues function

    Really? findnodes takes xpaths, so just give it some xpaths

      Give 64 xpaths is not a good solution I need a reg expression or something similar

        What makes 64 XPath expressions not a good solution? Maybe you want a loop?

Re: Help with xpath and TreeBuilder
by Gangabass (Priest) on Jun 29, 2012 at 11:04 UTC

    It's really easy:

    #!/usr/bin/perl use Modern::Perl; use HTML::TreeBuilder::XPath; use Data::Dumper; local $/; my $html = <DATA>; my $tree = HTML::TreeBuilder::XPath->new_from_content($html); my @values = $tree->findvalues('/html/body/p/a'); say Dumper(@values); __DATA__ <html> <body> <p><a href="1.html">Link Text 1</a></p> <p><a href="2.html">Link Text 2</a></p> <p><a href="3.html">Link Text 3</a></p> <p><a href="4.html">Link Text 4</a></p> </body> </html>

      Its even easier :)

      my @xpaths = qw{ /html/body/p/a /html/body/p[2]/a /html/body/p[3]/a .... /html/body/p[66]/a }; my $allXpaths = join ' | ', @xpaths; my @values = $tree->findvalues;

      I don't know about other xpath interpreters, but treebuilder::xpath (and xsh ) allows this

      This query is probably faster

      /html/body/p[ position()=1 or position()=2 or position()=3 or position()=66 ]/a

      Or this one

      /html/body/p[ ( ( position() > 0 and position() < 4 ) or ( position()=66 ) ) ]/a

      Though this one won't work with your html  //a[ position()=4] because each //a is at  //a[ position() = 1] because each //a is the only (first) child of its parent ( p ) --- I guess now I know how position() works

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://979047]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2014-04-16 04:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (413 votes), past polls