Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^4: UTF-8 text files with Byte Order Mark

by Anonymous Monk
on Sep 30, 2011 at 18:30 UTC ( #928893=note: print w/ replies, xml ) Need Help??


in reply to Re^3: UTF-8 text files with Byte Order Mark
in thread UTF-8 text files with Byte Order Mark

Thank you!!! This saved me a lot of trouble. I am also trying to strip out these UTF-8 byte order mark characters, which google docs puts in by default to downloaded text files. By the way I found that \x{FEFF} was not the same as \x{ef}\x{bb}\x{bf}


Comment on Re^4: UTF-8 text files with Byte Order Mark
Replies are listed 'Best First'.
Re^5: UTF-8 text files with Byte Order Mark
by ikegami (Pope) on Oct 01, 2011 at 21:53 UTC

    By the way I found that \x{FEFF} was not the same as \x{ef}\x{bb}\x{bf}

    Yeah, "\x{ef}\x{bb}\x{bf}" is the UTF-8 encoding of the BOM / U+FEFF / "\x{FEFF}".

      Please stop confusing. FEFF has nothing to do with UTF-8. This is a BOM for UTF-16 Big Endian-encoded files.

        This is a BOM for UTF-16 Big Endian-encoded files.

        You are mistaken. It's the BOM, period. It can be encoded using UTF-8 and UTF-16le just as easily as with UTF-16be.

        $ perl -MEncode -e'print encode("UTF-8", chr(0xFEFF))' | od -t x1 0000000 ef bb bf 0000003 $ perl -MEncode -e'print encode("UTF-16be", chr(0xFEFF))' | od -t x1 0000000 fe ff 0000002 $ perl -MEncode -e'print encode("UTF-16le", chr(0xFEFF))' | od -t x1 0000000 ff fe 0000002
        FEFFBOM
        2B,2F,76,38,2DBOM encoded using UTF-7
        EF,BB,BFBOM encoded using UTF-8
        FE,FFBOM encoded using UTF-16be
        FF,FEBOM encoded using UTF-16le
        00,00,FE,FFBOM encoded using UTF-32be
        FF,FE,00,00BOM encoded using UTF-32le

        So you won't find FE,FF in a UTF-8 file, but just like in a UTF-16be file, you can find an encoded FEFF in a UTF-8 file.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://928893]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (15)
As of 2015-07-31 14:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (278 votes), past polls