in reply to Re^3: HTML parsing module handles known and unknown encoding
in thread HTML parsing module handles known and unknown encoding
What are you saying? Are you pointing out the irrelevant fact that XML::LibXML can process some HTML other documents? Are you suggesting one should convince the provider of the HTML document to edit them so XML::LibXML can process them? Are you suggesting it's acceptable to do the following to parse an HTML document using XML::LibXML?
- Parse the HTML doc using another parser that can accept an encoding.
- If the document does not indicate its own encoding,
- Add a META element if none exist.
- Serialise the HTML.
- Replace the original HTML with this new HTML.
- Parse the HTML doc using XML::LibXML.