Pathologically Eclectic Rubbish Lister | |
PerlMonks |
Re^3: Encoding/decoding questionby mirod (Canon) |
on Sep 13, 2011 at 08:36 UTC ( [id://925645]=note: print w/replies, xml ) | Need Help?? |
You can use HTML::TreeBuilder to parse the HTML, then output it in XHTML, using the as_XML method, which works most of the time. It may not help with the encoding problem though, especially if the HTML lies about its encoding. XML::Twig can do this for you BTW, so in fact you may not need to use tidy at all, just install HTML::TreeBuilder and then use parsefile_html to parse the HTML. Also HTML::Tidy uses a fork of tidy, and may be worth a try.
In Section
Seekers of Perl Wisdom
|
|