in reply to
Re^2: Trivial HTML extractor utility
in thread Trivial HTML extractor utility
You should really give it a try, it's one of the few fine things coming from the XML world. I once wrote a utility called xmlgrep, which uses XPath expressions for extracting things from HTML or XML files. For extracting links one would write:
If you used HTML::TreeBuilder::XPath it would be even more powerful.
Not for me; I don't know how to write an xpath expression.
GET http://www.perlmonks.org | xmlgrep -parse-html '//a/@href'
but you can also add additional conditions, for example extract only absolute links:
GET http://www.perlmonks.org | xmlgrep -parse-html '//a/@href[contains