in reply to
Re^2: HTML parsing module handles known and unknown encoding
in thread HTML parsing module handles known and unknown encoding
I don't see any way of specifying the encoding of an HTML document
Yes, HTML encoding is specified in the HTTP headers, but you can use the 'http-equiv' attribute on a <meta> tag to include arbitrary headers in your HTML. For example:
<meta http-equiv="content-type" content="text/html; charset=utf-8"
+ />
Of course this will really only work in cases where the encoding is some superset of ASCII (like iso8859-*, utf8 etc).