http://www.perlmonks.org?node_id=835620


in reply to Re^2: Search and replace again
in thread Search and replace again

I was curious why you binmode STDOUT in your code (and in your other example) - is it to ensure you have UNIX line endings (and not CRLFs) in the output if Perl is running on Windows?

Replies are listed 'Best First'.
Re^4: Search and replace again
by ikegami (Patriarch) on Apr 20, 2010 at 01:39 UTC

    For starters, XML is a binary format. binmode definitely won't hurt anything.

    binmode doesn't just disable :crlf; it disables any :encoding too.

    You probably should always use binmode or equivalent (e.g. use open), either to remove layers* when you want to ensure the bytes are unmolested*, or to add some when you want to output text.

    * — These may be added via $ENV{PERLIO}, via -C, or by Perl itself as the case is for :crlf on Windows.

      "XML is a binary format" - not to nitpick (except I will), but that's just not right. From http://www.w3.org/XML/:

      Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).
      (emphasis mine) Obviously at some point the XML text has to be encoded to a binary format. XML::LibXML's toString method (when called on a document) does do that, so at that point, you are indeed dealing with binary data and should turn off any PerlIO layers on your output handle, as you did in your example. I didn't realize that $doc->toString returned binary data.

        I'm not going to argue that XML isn't text. At some levels, it definitely a valid position to think of XML as a text format. (It's human readable and human editable, after all.)

        But topic at hand is far lower level, and such details does matter. Let's compare HTML (a text format) and XML (a binary format).

        HTMLXML
        MIME typetextapplication (binary)
        Character EncodingExternal to documentEmbedded in document
        ParserThe document must be decoded prior to being given to the parser or information allowing the parser to do so must be provided to the parser.The document cannot be decoded prior to being given to the parser because the document must be parsed to determine its encoding.
        GeneratorThe document must be returned unencoded or the generator must indicate which encoding was used to encode it.The encoding must be chosen before the document is generated, so the text in the document is already encoded.

        Your definition may differ. This is the one I was using.