Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Help with xpath and TreeBuilder

by danny0085 (Sexton)
on Jun 29, 2012 at 06:01 UTC ( #979047=perlquestion: print w/ replies, xml ) Need Help??
danny0085 has asked for the wisdom of the Perl Monks concerning the following question:

I need to extract links from a html file (not all). Here are the xpaths

/html/body/p/a /html/body/p[2]/a /html/body/p[3]/a .... /html/body/p[66]/a

how can extract the links with TreeBuilder?

here is what I have I do not know how to use the findvalues function

$mech->get($url); $content = $mech->response()->decoded_content(); my $tree = HTML::TreeBuilder::XPath->new; $tree->parse($content); $tree->eof; @res = $tree->findvalues(???);#

Comment on Help with xpath and TreeBuilder
Select or Download Code
Replies are listed 'Best First'.
Re: Help with xpath and TreeBuilder
by Gangabass (Vicar) on Jun 29, 2012 at 11:04 UTC

    It's really easy:

    #!/usr/bin/perl use Modern::Perl; use HTML::TreeBuilder::XPath; use Data::Dumper; local $/; my $html = <DATA>; my $tree = HTML::TreeBuilder::XPath->new_from_content($html); my @values = $tree->findvalues('/html/body/p/a'); say Dumper(@values); __DATA__ <html> <body> <p><a href="1.html">Link Text 1</a></p> <p><a href="2.html">Link Text 2</a></p> <p><a href="3.html">Link Text 3</a></p> <p><a href="4.html">Link Text 4</a></p> </body> </html>

      Its even easier :)

      my @xpaths = qw{ /html/body/p/a /html/body/p[2]/a /html/body/p[3]/a .... /html/body/p[66]/a }; my $allXpaths = join ' | ', @xpaths; my @values = $tree->findvalues;

      I don't know about other xpath interpreters, but treebuilder::xpath (and xsh ) allows this

      This query is probably faster

      /html/body/p[ position()=1 or position()=2 or position()=3 or position()=66 ]/a

      Or this one

      /html/body/p[ ( ( position() > 0 and position() < 4 ) or ( position()=66 ) ) ]/a

      Though this one won't work with your html  //a[ position()=4] because each //a is at  //a[ position() = 1] because each //a is the only (first) child of its parent ( p ) --- I guess now I know how position() works

Re: Help with xpath and TreeBuilder
by Anonymous Monk on Jun 29, 2012 at 06:19 UTC

    here is what I have I do not know how to use the findvalues function

    Really? findnodes takes xpaths, so just give it some xpaths

      Give 64 xpaths is not a good solution I need a reg expression or something similar

        What makes 64 XPath expressions not a good solution? Maybe you want a loop?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://979047]
Approved by davido
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (10)
As of 2015-07-08 05:39 GMT
Find Nodes?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...

    Results (94 votes), past polls