good chemistry is complicated, and a little bit messy -LW |
|
PerlMonks |
Re^2: perltidy and UTF-8 BOMby afoken (Chancellor) |
on May 22, 2018 at 20:00 UTC ( [id://1215063]=note: print w/replies, xml ) | Need Help?? |
UTF-8 encoded text should - in theory - not need a BOM, that's correct. But there are only very few cases (see below) in which a BOM causes trouble. So many editors (and other text-processing tools) automatically switch to UTF-8 encoding when they find a BOM encoded as UTF-8 (0xEF, 0xBB, 0xBF) at file offset 0. This is often completely analogous to finding a BOM encoded in UTF-16 BE, UTF-16 LE, UTF-32 BE, UTF-32 LE. Without a BOM, they usually guess. UTF-16 and UTF-32 can often be guessed by the amount and position of 0x00 bytes. UTF-8 can also be guessed, but it is harder and can be mixed up with some legacy encoding. So, prefixing UTF-8 encoded text with a BOM makes life easier for most tools, that's all. The Unix #! mechanism is broken by a leading BOM, simply because the kernel expects the first two bytes of the file to be 0x23, 0x21. The BOM takes up two to four bytes and is often invisible in editors. The kernel sees an invalid magic number and so does not consider the file as a script, while the user believes that the file starts with #!. (Adding support for scripts with a BOM should be quite easy, by simply treating 0xEF 0xBB 0xBF 0x23 0x21 at the start of a file like 0x23 0x21 at the start of a file.) https://validator.w3.org/ warns if input starts with a BOM, claiming that old editors and old browsers have problems with the BOM. Alexander
-- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
In Section
Seekers of Perl Wisdom
|
|