in reply to Distinguishing text from binary data
In general you need to read the whole string one character at a time and if you come across something that doesn't make sense in your character encoding, it's binary. Otherwise it's text. In the case of ASCII text, the characters that don't make sense are most of the control characters and 0x7F - 0xFF. The following control characters are usually considered OK in text data:
- 0x09 - tab
- 0x0A - line feed
- 0x0C - form feed
- 0x0D - carriage return
print ($text =~ /[^\x09\x0a\x0c\x0d\x20-\x7e]/) ? "binary\n" : "text\n";
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Distinguishing text from binary data
by ww (Archbishop) on Oct 05, 2004 at 14:22 UTC |
In Section
Seekers of Perl Wisdom