http://www.perlmonks.org?node_id=996744


in reply to Remove BOM ?

This little snippet comes from Mojo::JSON:

# Remove BOM $bytes =~ s/^(?:\357\273\277|\377\376\0\0|\0\0\376\377|\376\377|\377 +\376)//g;

Dave

Replies are listed 'Best First'.
Re^2: Remove BOM ?
by GrandFather (Saint) on Oct 02, 2012 at 20:04 UTC

    except that /g can't be right. A BOM can only appear as the first few bytes of a data stream. If there is a further BOM then most likely you've got a binary file rather than a text file.

    It's not clear to me what the nulls are doing in there.

    True laziness is hard work
      It's not clear to me what the nulls are doing in there.

      BOM in UTF-32 LE and BE encodings.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      It depends. Unless all your text processing tools are UNICODE-smart, you can easily end up with a BOM at the beginning of any line, not just the first, and really they could end up anywhere, depending on what you're doing. Imagine using cat and paste on files with BOMs. In my experience (and I've had quite a bit), I almost always end up having to delete BOM-looking strings from the entire file, not just the beginning.