Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^2: UTF-8 text files with Byte Order Mark

by muba (Priest)
on Feb 13, 2007 at 20:21 UTC ( #599767=note: print w/ replies, xml ) Need Help??


in reply to Re: UTF-8 text files with Byte Order Mark
in thread UTF-8 text files with Byte Order Mark

Yeah, this works, except that the BOM indeed is a three-bytes thing as said above. So the code, that seems to work, now looks like this:

while (my $line = <$rulesFH>) { if ($. == 1) { # Remove Byte Order Mark if it's there use Encode; my $octets = encode("utf8", $line); $octets =~ s/^\x{ef}\x{bb}\x{bf}//; $line = decode("utf8", $octets); } # rest... }


Comment on Re^2: UTF-8 text files with Byte Order Mark
Download Code
Re^3: UTF-8 text files with Byte Order Mark
by ikegami (Pope) on Feb 13, 2007 at 20:48 UTC
    my $octets = encode("utf8", $line); $octets =~ s/^\x{ef}\x{bb}\x{bf}//; $line = decode("utf8", $octets);

    is the same thing as

    my $BOM = decode("utf8", "\x{ef}\x{bb}\x{bf}"); $line =~ s/^$BOM//;

    is the same thing as

    my $BOM = chr(0xFEFF); $line =~ s/^$BOM//;

    is the same thing as

    $line =~ s/^\x{FEFF}//;

    which is what I gave you. Much simpler!

      Meh. Indeed, I didn't realise that. Thank you!

      Thank you!!! This saved me a lot of trouble. I am also trying to strip out these UTF-8 byte order mark characters, which google docs puts in by default to downloaded text files. By the way I found that \x{FEFF} was not the same as \x{ef}\x{bb}\x{bf}

        By the way I found that \x{FEFF} was not the same as \x{ef}\x{bb}\x{bf}

        Yeah, "\x{ef}\x{bb}\x{bf}" is the UTF-8 encoding of the BOM / U+FEFF / "\x{FEFF}".

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://599767]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2015-07-03 17:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (55 votes), past polls