Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: UTF-8 text files with Byte Order Mark

by ikegami (Pope)
on Feb 13, 2007 at 20:48 UTC ( #599778=note: print w/ replies, xml ) Need Help??


in reply to Re^2: UTF-8 text files with Byte Order Mark
in thread UTF-8 text files with Byte Order Mark

my $octets = encode("utf8", $line); $octets =~ s/^\x{ef}\x{bb}\x{bf}//; $line = decode("utf8", $octets);

is the same thing as

my $BOM = decode("utf8", "\x{ef}\x{bb}\x{bf}"); $line =~ s/^$BOM//;

is the same thing as

my $BOM = chr(0xFEFF); $line =~ s/^$BOM//;

is the same thing as

$line =~ s/^\x{FEFF}//;

which is what I gave you. Much simpler!


Comment on Re^3: UTF-8 text files with Byte Order Mark
Select or Download Code
Re^4: UTF-8 text files with Byte Order Mark
by muba (Priest) on Feb 13, 2007 at 21:01 UTC

    Meh. Indeed, I didn't realise that. Thank you!

Re^4: UTF-8 text files with Byte Order Mark
by Anonymous Monk on Sep 30, 2011 at 18:30 UTC
    Thank you!!! This saved me a lot of trouble. I am also trying to strip out these UTF-8 byte order mark characters, which google docs puts in by default to downloaded text files. By the way I found that \x{FEFF} was not the same as \x{ef}\x{bb}\x{bf}

      By the way I found that \x{FEFF} was not the same as \x{ef}\x{bb}\x{bf}

      Yeah, "\x{ef}\x{bb}\x{bf}" is the UTF-8 encoding of the BOM / U+FEFF / "\x{FEFF}".

        Please stop confusing. FEFF has nothing to do with UTF-8. This is a BOM for UTF-16 Big Endian-encoded files.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://599778]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2014-09-20 09:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (158 votes), past polls