Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: UTF-8 text files with Byte Order Mark

by ikegami (Pope)
on Feb 13, 2007 at 17:55 UTC ( #599731=note: print w/ replies, xml ) Need Help??


in reply to UTF-8 text files with Byte Order Mark

so I kinda assume that Perl will handle with this kind of stuff for me.

Having Perl remove the BOM automatically would be bad. print while <$fh>; would no longer print out a file exactly, for example. It wouldn't be possible to print out a file exactly by other means either.

However, if file contains that BOM, my program does not understand the first line in the file

Patient: "Doctor, it hurts when I do this."
Doctor: "So don't do it!"

If your program doesn't accept BOMs, don't feed it any. BOMs are not required.

Alternatively, you could change your spec and your program to accept it.

while (<$fh>) { s/\x{FEFF}//g; ... }


Comment on Re: UTF-8 text files with Byte Order Mark
Select or Download Code
Replies are listed 'Best First'.
Re^2: UTF-8 text files with Byte Order Mark
by muba (Priest) on Feb 13, 2007 at 20:05 UTC
    Patient: "Doctor, it hurts when I do this."
    Doctor: "So don't do it!"

    Easy to say, of course, but what if the program one of my users uses stores that BOM anyway? Besides, as pointed out, a BOM in a utf-8 file *are* valid so I feel I should support it. Look, if the user was toying around with malformed files I'd be more than happy to tell him to get that fixed :D but apparently he's doing what he righteously thinks is righs.

      a BOM in a utf-8 file *are* valid

      "!" in an ASCII file is also valid. But if you place a "!" at the start of your Perl program, it probably will not compile. It is a malformed file, not from a UNICODE perspective, but from your parser's perspective.

      I provided two alternatives (removing the BOM and File::BOM) that will work with your broken tools (i.e. tools that add undesirable character to the files you edit). I'd go with them since allowing the BOM is surely a good thing.

        Ouch. I'm afraid I used the wrong tone in my previous reply. You see, I am now removing that BOM myself (as you can read below). I never meant to attack or critisize you. In fact, I much appreciate your input!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://599731]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2015-07-30 01:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (269 votes), past polls