Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^3: Encoding/decoding question

by mirod (Canon)
on Sep 13, 2011 at 08:36 UTC ( #925645=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Encoding/decoding question
in thread Encoding/decoding question

You can use HTML::TreeBuilder to parse the HTML, then output it in XHTML, using the as_XML method, which works most of the time. It may not help with the encoding problem though, especially if the HTML lies about its encoding. XML::Twig can do this for you BTW, so in fact you may not need to use tidy at all, just install HTML::TreeBuilder and then use parsefile_html to parse the HTML.

Also HTML::Tidy uses a fork of tidy, and may be worth a try.


Comment on Re^3: Encoding/decoding question

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://925645]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2015-07-06 07:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (70 votes), past polls