http://www.perlmonks.org?node_id=835845


in reply to Re^4: Search and replace again
in thread Search and replace again

"XML is a binary format" - not to nitpick (except I will), but that's just not right. From http://www.w3.org/XML/:

Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).
(emphasis mine) Obviously at some point the XML text has to be encoded to a binary format. XML::LibXML's toString method (when called on a document) does do that, so at that point, you are indeed dealing with binary data and should turn off any PerlIO layers on your output handle, as you did in your example. I didn't realize that $doc->toString returned binary data.

Replies are listed 'Best First'.
Re^6: Search and replace again
by ikegami (Patriarch) on Apr 20, 2010 at 20:04 UTC

    I'm not going to argue that XML isn't text. At some levels, it definitely a valid position to think of XML as a text format. (It's human readable and human editable, after all.)

    But topic at hand is far lower level, and such details does matter. Let's compare HTML (a text format) and XML (a binary format).

    HTMLXML
    MIME typetextapplication (binary)
    Character EncodingExternal to documentEmbedded in document
    ParserThe document must be decoded prior to being given to the parser or information allowing the parser to do so must be provided to the parser.The document cannot be decoded prior to being given to the parser because the document must be parsed to determine its encoding.
    GeneratorThe document must be returned unencoded or the generator must indicate which encoding was used to encode it.The encoding must be chosen before the document is generated, so the text in the document is already encoded.

    Your definition may differ. This is the one I was using.