Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

The Unicode Standard permits the BOM in UTF-8, but does not require or recommend its use. Byte order has no meaning in UTF-8

what's the reason [...] using a BOM for [...] UTF-8 [...]?

UTF-8 encoded text should - in theory - not need a BOM, that's correct. But there are only very few cases (see below) in which a BOM causes trouble. So many editors (and other text-processing tools) automatically switch to UTF-8 encoding when they find a BOM encoded as UTF-8 (0xEF, 0xBB, 0xBF) at file offset 0. This is often completely analogous to finding a BOM encoded in UTF-16 BE, UTF-16 LE, UTF-32 BE, UTF-32 LE. Without a BOM, they usually guess. UTF-16 and UTF-32 can often be guessed by the amount and position of 0x00 bytes. UTF-8 can also be guessed, but it is harder and can be mixed up with some legacy encoding.

So, prefixing UTF-8 encoded text with a BOM makes life easier for most tools, that's all.

The Unix #! mechanism is broken by a leading BOM, simply because the kernel expects the first two bytes of the file to be 0x23, 0x21. The BOM takes up two to four bytes and is often invisible in editors. The kernel sees an invalid magic number and so does not consider the file as a script, while the user believes that the file starts with #!. (Adding support for scripts with a BOM should be quite easy, by simply treating 0xEF 0xBB 0xBF 0x23 0x21 at the start of a file like 0x23 0x21 at the start of a file.) warns if input starts with a BOM, claiming that old editors and old browsers have problems with the BOM.


Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

In reply to Re^2: perltidy and UTF-8 BOM by afoken
in thread perltidy and UTF-8 BOM by morelenmir

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others lurking in the Monastery: (4)
    As of 2020-01-21 00:20 GMT
    Find Nodes?
      Voting Booth?