Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Help with xpath and TreeBuilder

by danny0085 (Sexton)
on Jun 29, 2012 at 06:01 UTC ( #979047=perlquestion: print w/replies, xml ) Need Help??
danny0085 has asked for the wisdom of the Perl Monks concerning the following question:

I need to extract links from a html file (not all). Here are the xpaths

/html/body/p/a /html/body/p[2]/a /html/body/p[3]/a .... /html/body/p[66]/a

how can extract the links with TreeBuilder?

here is what I have I do not know how to use the findvalues function

$mech->get($url); $content = $mech->response()->decoded_content(); my $tree = HTML::TreeBuilder::XPath->new; $tree->parse($content); $tree->eof; @res = $tree->findvalues(???);#

Replies are listed 'Best First'.
Re: Help with xpath and TreeBuilder
by Gangabass (Vicar) on Jun 29, 2012 at 11:04 UTC

    It's really easy:

    #!/usr/bin/perl use Modern::Perl; use HTML::TreeBuilder::XPath; use Data::Dumper; local $/; my $html = <DATA>; my $tree = HTML::TreeBuilder::XPath->new_from_content($html); my @values = $tree->findvalues('/html/body/p/a'); say Dumper(@values); __DATA__ <html> <body> <p><a href="1.html">Link Text 1</a></p> <p><a href="2.html">Link Text 2</a></p> <p><a href="3.html">Link Text 3</a></p> <p><a href="4.html">Link Text 4</a></p> </body> </html>

      Its even easier :)

      my @xpaths = qw{ /html/body/p/a /html/body/p[2]/a /html/body/p[3]/a .... /html/body/p[66]/a }; my $allXpaths = join ' | ', @xpaths; my @values = $tree->findvalues;

      I don't know about other xpath interpreters, but treebuilder::xpath (and xsh ) allows this

      This query is probably faster

      /html/body/p[ position()=1 or position()=2 or position()=3 or position()=66 ]/a

      Or this one

      /html/body/p[ ( ( position() > 0 and position() < 4 ) or ( position()=66 ) ) ]/a

      Though this one won't work with your html  //a[ position()=4] because each //a is at  //a[ position() = 1] because each //a is the only (first) child of its parent ( p ) --- I guess now I know how position() works

Re: Help with xpath and TreeBuilder
by Anonymous Monk on Jun 29, 2012 at 06:19 UTC

    here is what I have I do not know how to use the findvalues function

    Really? findnodes takes xpaths, so just give it some xpaths

      Give 64 xpaths is not a good solution I need a reg expression or something similar

        What makes 64 XPath expressions not a good solution? Maybe you want a loop?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://979047]
Approved by davido
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2018-01-23 23:08 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (254 votes). Check out past polls.