Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Help with xpath and TreeBuilder

by danny0085 (Sexton)
on Jun 29, 2012 at 06:01 UTC ( #979047=perlquestion: print w/ replies, xml ) Need Help??
danny0085 has asked for the wisdom of the Perl Monks concerning the following question:

I need to extract links from a html file (not all). Here are the xpaths

/html/body/p/a /html/body/p[2]/a /html/body/p[3]/a .... /html/body/p[66]/a

how can extract the links with TreeBuilder?

here is what I have I do not know how to use the findvalues function

$mech->get($url); $content = $mech->response()->decoded_content(); my $tree = HTML::TreeBuilder::XPath->new; $tree->parse($content); $tree->eof; @res = $tree->findvalues(???);#

Comment on Help with xpath and TreeBuilder
Select or Download Code
Re: Help with xpath and TreeBuilder
by Anonymous Monk on Jun 29, 2012 at 06:19 UTC

    here is what I have I do not know how to use the findvalues function

    Really? findnodes takes xpaths, so just give it some xpaths

      Give 64 xpaths is not a good solution I need a reg expression or something similar

        What makes 64 XPath expressions not a good solution? Maybe you want a loop?

Re: Help with xpath and TreeBuilder
by Gangabass (Priest) on Jun 29, 2012 at 11:04 UTC

    It's really easy:

    #!/usr/bin/perl use Modern::Perl; use HTML::TreeBuilder::XPath; use Data::Dumper; local $/; my $html = <DATA>; my $tree = HTML::TreeBuilder::XPath->new_from_content($html); my @values = $tree->findvalues('/html/body/p/a'); say Dumper(@values); __DATA__ <html> <body> <p><a href="1.html">Link Text 1</a></p> <p><a href="2.html">Link Text 2</a></p> <p><a href="3.html">Link Text 3</a></p> <p><a href="4.html">Link Text 4</a></p> </body> </html>

      Its even easier :)

      my @xpaths = qw{ /html/body/p/a /html/body/p[2]/a /html/body/p[3]/a .... /html/body/p[66]/a }; my $allXpaths = join ' | ', @xpaths; my @values = $tree->findvalues;

      I don't know about other xpath interpreters, but treebuilder::xpath (and xsh ) allows this

      This query is probably faster

      /html/body/p[ position()=1 or position()=2 or position()=3 or position()=66 ]/a

      Or this one

      /html/body/p[ ( ( position() > 0 and position() < 4 ) or ( position()=66 ) ) ]/a

      Though this one won't work with your html  //a[ position()=4] because each //a is at  //a[ position() = 1] because each //a is the only (first) child of its parent ( p ) --- I guess now I know how position() works

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://979047]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (16)
As of 2014-11-26 15:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (171 votes), past polls