Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
laziness, impatience, and hubris
 
PerlMonks  

Re: Remove BOM ?

by davido (Archbishop)
on Oct 01, 2012 at 20:49 UTC ( #996744=note: print w/ replies, xml ) Need Help??


in reply to Remove BOM ?

This little snippet comes from Mojo::JSON:

# Remove BOM $bytes =~ s/^(?:\357\273\277|\377\376\0\0|\0\0\376\377|\376\377|\377 +\376)//g;

Dave


Comment on Re: Remove BOM ?
Download Code
Re^2: Remove BOM ?
by GrandFather (Cardinal) on Oct 02, 2012 at 20:04 UTC

    except that /g can't be right. A BOM can only appear as the first few bytes of a data stream. If there is a further BOM then most likely you've got a binary file rather than a text file.

    It's not clear to me what the nulls are doing in there.

    True laziness is hard work

      It depends. Unless all your text processing tools are UNICODE-smart, you can easily end up with a BOM at the beginning of any line, not just the first, and really they could end up anywhere, depending on what you're doing. Imagine using cat and paste on files with BOMs. In my experience (and I've had quite a bit), I almost always end up having to delete BOM-looking strings from the entire file, not just the beginning.

      It's not clear to me what the nulls are doing in there.

      BOM in UTF-32 LE and BE encodings.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://996744]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (8)
As of 2014-04-21 08:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (492 votes), past polls