in reply to Distinguishing text from binary data
Your code is a little restrictive as it treats linefeeds, whitespace, punctuation etc. as non-characters and then decides that something is text if there are less than 100 of them. Try changing your code to work on ranges of the ascii table and then use a percentage as your test.
Re^2: Distinguishing text from binary data
by maard (Pilgrim) on Oct 06, 2004 at 10:26 UTC
|
Also don't forget about non-english encodings in which form data can be sent (english coders often forget about it :-) ). IMO, presence of 0x00..0x1F bytes in such data as HTTP response can mark it as binary (unless the form is sent in utf-8). So maybe you should take into consideration charset from Content-Type header and only then analyze byte/character stream. | [reply] |
|