Probably the simplest way is to use the -asxml flag of tidy - which is written in perl :) [There is a perl wrapper for TidyLib, called HTML::Tidy].
$ tidy -asxml foo.html > foo.xml
| [reply] [d/l] |
While we're suggesting modules, I'd also point to libxml2 and associated utilities, which is probably installed if you have a recentish linux installation, and is available through that link if not. It also also has an associated Perl module XML::LibXML. The bonus is, if you install that stuff, you can process the resulting XML with Perl. The drawback to the tidy-based approach is that the libxml2 code is more generic, and so you'd have to work to get DOCTYPE lines to come out correctly; however, libxml2 also has a wider area of application.
If not P, what? Q maybe? "Sidney Morgenbesser"
| [reply] [d/l] [select] |
The HTML::Tree suite seems to have some XML capabilities. HTML::Element has an XML dump method: $h->as_XML(), which might be a first step, depending on what you want to do.
There is also a HTML::DOMbo module, which turns your HTML tree into an XML tree, and AFAICS, lets you use all of the DOM tools you want on it.
While I have been using HTML::Tree a lot recently (and I highly recommend it for doing most anything with HTML), I haven't experimented with the XML stuff yet. But it seems promising.
| [reply] [d/l] |
First of all, I would like to thank you guys, Zaxo, Arturo, Anonymous Monk and Skillet Thief for the prompt response.
I will try the modules you suggested and hopefully come back with a big smile on my face.
Your help was very much appreciated!
See you soon!
| [reply] |