Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: Distinguishing text from binary data

by inman (Curate)
on Oct 05, 2004 at 10:41 UTC ( #396546=note: print w/replies, xml ) Need Help??

in reply to Distinguishing text from binary data

Your code is a little restrictive as it treats linefeeds, whitespace, punctuation etc. as non-characters and then decides that something is text if there are less than 100 of them. Try changing your code to work on ranges of the ascii table and then use a percentage as your test.
  • Comment on Re: Distinguishing text from binary data

Replies are listed 'Best First'.
Re^2: Distinguishing text from binary data
by maard (Pilgrim) on Oct 06, 2004 at 10:26 UTC
    Also don't forget about non-english encodings in which form data can be sent (english coders often forget about it :-) ). IMO, presence of 0x00..0x1F bytes in such data as HTTP response can mark it as binary (unless the form is sent in utf-8). So maybe you should take into consideration charset from Content-Type header and only then analyze byte/character stream.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://396546]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2018-05-24 23:43 GMT
Find Nodes?
    Voting Booth?