Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Distinguishing text from binary data

by DrHyde (Prior)
on Oct 05, 2004 at 13:14 UTC ( #396566=note: print w/ replies, xml ) Need Help??


in reply to Distinguishing text from binary data

In general you need to read the whole string one character at a time and if you come across something that doesn't make sense in your character encoding, it's binary. Otherwise it's text. In the case of ASCII text, the characters that don't make sense are most of the control characters and 0x7F - 0xFF. The following control characters are usually considered OK in text data:

  • 0x09 - tab
  • 0x0A - line feed
  • 0x0C - form feed
  • 0x0D - carriage return
A regex-ish way of detecting non-ASCII data based on that might be:
print ($text =~ /[^\x09\x0a\x0c\x0d\x20-\x7e]/) ? "binary\n" : "text\n";


Comment on Re: Distinguishing text from binary data
Download Code
Re^2: Distinguishing text from binary data
by ww (Bishop) on Oct 05, 2004 at 14:22 UTC

    further re DrHyde's offering: Tho he did not make it explicit, his approach offers a good first step for protecting yourself against embedded malware.

    Obvious? Maybe. Maybe that's already why you're checking the input. Or maybe the http: response is coming from a machine you control and thus, trust.

    But unless you're rilly, rilly POSITIVE! the incoming data is always going to be clean, you really do want to consider the obvious... very early in the game.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://396566]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2014-08-20 08:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (107 votes), past polls