Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: How to extract xpath from the webpage

by Corion (Patriarch)
on Nov 03, 2009 at 10:14 UTC ( [id://804642]=note: print w/replies, xml ) Need Help??


in reply to How to extract xpath from the webpage

Do you really mean that you have an HTML structure and want one XPath expression for each element?

This smells of homework to me because constructing an XPath expression if you have the path to an element is trivial:

<myml> <foo> <bar id="1" /> <bar id="2" /> </foo> </myml>

To get the xpath expression for each element, you concatenate all parent tags of each elements with /, and add the index of each element as the :nth-child axis.

Generating such an XPath expression does not help you much, which is why I think this is homework. But if this is not homework, maybe you can explain what actual problem you're trying to solve.

Replies are listed 'Best First'.
Re^2: How to extract xpath from the webpage
by perladdict (Chaplain) on Nov 04, 2009 at 05:50 UTC
    Hi Corion, I am doing web page automation to find the links, text and image links by using selenium,which uses xpath to locate the links like "//td2/div/a/img" from the web page source. I am trying.
    I am trying with Html::TreeBuilder::xpath, i don't know what are all the other modules i can import in my script.

      If Selenium supports XPath queries, you don't need any Perl XPath modules. If you want to access Selenium and its results, see WWW::Selenium. If you want to use HTML::TreeBuilder::XPath, I'm not sure where your actual problem in your code is. The "synopsis" section shows how to extract HTML fragments from a given HTML string. Maybe you want to fetch the images using LWP::UserAgent then?

      Personally, I automate websites with WWW::Mechanize::FireFox, which supports Javascript (and XPath).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://804642]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-04-19 13:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found