Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^4: UTF-8 text files with Byte Order Mark

by Anonymous Monk
on Sep 30, 2011 at 18:30 UTC ( #928893=note: print w/ replies, xml ) Need Help??


in reply to Re^3: UTF-8 text files with Byte Order Mark
in thread UTF-8 text files with Byte Order Mark

Thank you!!! This saved me a lot of trouble. I am also trying to strip out these UTF-8 byte order mark characters, which google docs puts in by default to downloaded text files. By the way I found that \x{FEFF} was not the same as \x{ef}\x{bb}\x{bf}


Comment on Re^4: UTF-8 text files with Byte Order Mark
Re^5: UTF-8 text files with Byte Order Mark
by ikegami (Pope) on Oct 01, 2011 at 21:53 UTC

    By the way I found that \x{FEFF} was not the same as \x{ef}\x{bb}\x{bf}

    Yeah, "\x{ef}\x{bb}\x{bf}" is the UTF-8 encoding of the BOM / U+FEFF / "\x{FEFF}".

      Please stop confusing. FEFF has nothing to do with UTF-8. This is a BOM for UTF-16 Big Endian-encoded files.

        This is a BOM for UTF-16 Big Endian-encoded files.

        You are mistaken. It's the BOM, period. It can be encoded using UTF-8 and UTF-16le just as easily as with UTF-16be.

        $ perl -MEncode -e'print encode("UTF-8", chr(0xFEFF))' | od -t x1 0000000 ef bb bf 0000003 $ perl -MEncode -e'print encode("UTF-16be", chr(0xFEFF))' | od -t x1 0000000 fe ff 0000002 $ perl -MEncode -e'print encode("UTF-16le", chr(0xFEFF))' | od -t x1 0000000 ff fe 0000002
        FEFFBOM
        2B,2F,76,38,2DBOM encoded using UTF-7
        EF,BB,BFBOM encoded using UTF-8
        FE,FFBOM encoded using UTF-16be
        FF,FEBOM encoded using UTF-16le
        00,00,FE,FFBOM encoded using UTF-32be
        FF,FE,00,00BOM encoded using UTF-32le

        So you won't find FE,FF in a UTF-8 file, but just like in a UTF-16be file, you can find an encoded FEFF in a UTF-8 file.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://928893]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (8)
As of 2014-12-21 18:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (106 votes), past polls