How to extract xpath from the webpage

perladdict has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How to extract xpath from the webpage by moritz (Cardinal) on Nov 03, 2009 at 10:13 UTC
Your question is a bit puzzling - do you really want to obtain an xpath expression for each HTML tag? Usually it's the other way round: You need to extract some specific HTML tags, and use xpath for that. So the best way to start would be to learn XPath a bit, then look at the HTML page you want to extract stuff from, and write an XPath expression to extract what you need. Install HTML::TreeBuilder::XPath, experiment with it, and refine your xpath expression until it does what you want.	[reply]
Re: How to extract xpath from the webpage by Corion (Patriarch) on Nov 03, 2009 at 10:14 UTC
Do you really mean that you have an HTML structure and want one XPath expression for each element? This smells of homework to me because constructing an XPath expression if you have the path to an element is trivial: `<myml> <foo> <bar id="1" /> <bar id="2" /> </foo> </myml>` [download] To get the xpath expression for each element, you concatenate all parent tags of each elements with `/`, and add the index of each element as the `:nth-child` axis. Generating such an XPath expression does not help you much, which is why I think this is homework. But if this is not homework, maybe you can explain what actual problem you're trying to solve.	[reply] [d/l] [select]
Re^2: How to extract xpath from the webpage by perladdict (Chaplain) on Nov 04, 2009 at 05:50 UTC
Hi Corion, I am doing web page automation to find the links, text and image links by using selenium,which uses xpath to locate the links like "//td2/div/a/img" from the web page source. I am trying. I am trying with Html::TreeBuilder::xpath, i don't know what are all the other modules i can import in my script.	[reply]
Re^3: How to extract xpath from the webpage by Corion (Patriarch) on Nov 04, 2009 at 08:10 UTC
If Selenium supports XPath queries, you don't need any Perl XPath modules. If you want to access Selenium and its results, see WWW::Selenium. If you want to use HTML::TreeBuilder::XPath, I'm not sure where your actual problem in your code is. The "synopsis" section shows how to extract HTML fragments from a given HTML string. Maybe you want to fetch the images using LWP::UserAgent then? Personally, I automate websites with WWW::Mechanize::FireFox, which supports Javascript (and XPath).	[reply]
Re: How to extract xpath from the webpage by spx2 (Deacon) on Nov 03, 2009 at 14:16 UTC
Here you go, this should get you started with HTML::TreeBuilder::XPath. It's code that parses google search results. You will get an IP ban if you use it too much so this is just for educational purposes. Also consider reading the actual documentation. Good luck and most importantly, have fun!	[reply]
Re^2: How to extract xpath from the webpage by Anonymous Monk on Sep 14, 2010 at 06:41 UTC
Can you please add script that uses the google perl script you have shared? I am looking for capturing all the xpaths for the search term "blue suede shoes" on google page. Thanks, M	[reply]


Syntactic Confectionery Delight
	PerlMonks