Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Can't tell if UTF-8... or just binary...

by bart (Canon)
on Aug 23, 2011 at 21:29 UTC ( #922007=note: print w/replies, xml ) Need Help??


in reply to Can't tell if UTF-8... or just binary...

I used to have a text editor that determined whether a file was text or binary based on whether it contains null bytes ("\0"). It works extremely well in practice, since virtually all binary strings contain null bytes.

It'll work as well with Unicode text, at least, if it's UTF-8. 16 bit (and 32 bit) Unicode text contains a lot of null bytes, typically every other byte for 16 bit, and 3 out of every 4 bytes for 32 bits.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://922007]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2021-10-25 10:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My first memorable Perl project was:







    Results (89 votes). Check out past polls.

    Notices?