Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re^5: UTF-8 text files with Byte Order Mark

by ikegami (Pope)
on Oct 01, 2011 at 21:53 UTC ( #929073=note: print w/replies, xml ) Need Help??

in reply to Re^4: UTF-8 text files with Byte Order Mark
in thread UTF-8 text files with Byte Order Mark

By the way I found that \x{FEFF} was not the same as \x{ef}\x{bb}\x{bf}

Yeah, "\x{ef}\x{bb}\x{bf}" is the UTF-8 encoding of the BOM / U+FEFF / "\x{FEFF}".

  • Comment on Re^5: UTF-8 text files with Byte Order Mark

Replies are listed 'Best First'.
Re^6: UTF-8 text files with Byte Order Mark
by Anonymous Monk on May 23, 2012 at 13:19 UTC

    Please stop confusing. FEFF has nothing to do with UTF-8. This is a BOM for UTF-16 Big Endian-encoded files.

      This is a BOM for UTF-16 Big Endian-encoded files.

      You are mistaken. It's the BOM, period. It can be encoded using UTF-8 and UTF-16le just as easily as with UTF-16be.

      $ perl -MEncode -e'print encode("UTF-8", chr(0xFEFF))' | od -t x1 0000000 ef bb bf 0000003 $ perl -MEncode -e'print encode("UTF-16be", chr(0xFEFF))' | od -t x1 0000000 fe ff 0000002 $ perl -MEncode -e'print encode("UTF-16le", chr(0xFEFF))' | od -t x1 0000000 ff fe 0000002
      2B,2F,76,38,2DBOM encoded using UTF-7
      EF,BB,BFBOM encoded using UTF-8
      FE,FFBOM encoded using UTF-16be
      FF,FEBOM encoded using UTF-16le
      00,00,FE,FFBOM encoded using UTF-32be
      FF,FE,00,00BOM encoded using UTF-32le

      So you won't find FE,FF in a UTF-8 file, but just like in a UTF-16be file, you can find an encoded FEFF in a UTF-8 file.

        I'm trying my best to understand this thread, but I'm having difficulty.
        I'm dealing with the same issue where Notepad seems to add the BOM to the beginning of UTF-8 files. I've tried deleting it using all these commands, none of which works:

        s/chr(0xEFBBBF)//; #remove Byte Order Mark

        Another clue: When I was using Strawberry Perl, I was able to use \x{064E} to refer to an Arabic vowel marker, and that worked. But now I'm using ActiveState, and that no longer works.
        But I haven't been able to reference the BOM using either Strawberry or Active State. So I'm wondering if there's some sort of package I need to reference in order to make Perl recognize the \x{NNNN} format. Any suggestions?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929073]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2018-03-23 20:35 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (296 votes). Check out past polls.