Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: UTF-8 text files with Byte Order Mark

by ikegami (Pope)
on Feb 13, 2007 at 17:55 UTC ( #599731=note: print w/ replies, xml ) Need Help??


in reply to UTF-8 text files with Byte Order Mark

so I kinda assume that Perl will handle with this kind of stuff for me.

Having Perl remove the BOM automatically would be bad. print while <$fh>; would no longer print out a file exactly, for example. It wouldn't be possible to print out a file exactly by other means either.

However, if file contains that BOM, my program does not understand the first line in the file

Patient: "Doctor, it hurts when I do this."
Doctor: "So don't do it!"

If your program doesn't accept BOMs, don't feed it any. BOMs are not required.

Alternatively, you could change your spec and your program to accept it.

while (<$fh>) { s/\x{FEFF}//g; ... }


Comment on Re: UTF-8 text files with Byte Order Mark
Select or Download Code
Re^2: UTF-8 text files with Byte Order Mark
by muba (Priest) on Feb 13, 2007 at 20:05 UTC
    Patient: "Doctor, it hurts when I do this."
    Doctor: "So don't do it!"

    Easy to say, of course, but what if the program one of my users uses stores that BOM anyway? Besides, as pointed out, a BOM in a utf-8 file *are* valid so I feel I should support it. Look, if the user was toying around with malformed files I'd be more than happy to tell him to get that fixed :D but apparently he's doing what he righteously thinks is righs.

      a BOM in a utf-8 file *are* valid

      "!" in an ASCII file is also valid. But if you place a "!" at the start of your Perl program, it probably will not compile. It is a malformed file, not from a UNICODE perspective, but from your parser's perspective.

      I provided two alternatives (removing the BOM and File::BOM) that will work with your broken tools (i.e. tools that add undesirable character to the files you edit). I'd go with them since allowing the BOM is surely a good thing.

        Ouch. I'm afraid I used the wrong tone in my previous reply. You see, I am now removing that BOM myself (as you can read below). I never meant to attack or critisize you. In fact, I much appreciate your input!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://599731]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2014-09-02 00:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (18 votes), past polls