http://www.perlmonks.org?node_id=396566


in reply to Distinguishing text from binary data

In general you need to read the whole string one character at a time and if you come across something that doesn't make sense in your character encoding, it's binary. Otherwise it's text. In the case of ASCII text, the characters that don't make sense are most of the control characters and 0x7F - 0xFF. The following control characters are usually considered OK in text data: A regex-ish way of detecting non-ASCII data based on that might be:
print ($text =~ /[^\x09\x0a\x0c\x0d\x20-\x7e]/) ? "binary\n" : "text\n";

Replies are listed 'Best First'.
Re^2: Distinguishing text from binary data
by ww (Archbishop) on Oct 05, 2004 at 14:22 UTC

    further re DrHyde's offering: Tho he did not make it explicit, his approach offers a good first step for protecting yourself against embedded malware.

    Obvious? Maybe. Maybe that's already why you're checking the input. Or maybe the http: response is coming from a machine you control and thus, trust.

    But unless you're rilly, rilly POSITIVE! the incoming data is always going to be clean, you really do want to consider the obvious... very early in the game.