http://www.perlmonks.org?node_id=1091291

csorrentini has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, Just wondering if this would be the correct way to write a subroutine to see if a file is binary or text. It appears to run however when I changed the file it looks at to all zeros and ones it still showed up with Text file? Any advice? sub type { if (-T $FileInfoName) { return "Text file"; } elsif (-B $FileInfoName) { return "Binary file" } }

Replies are listed 'Best First'.
Re: Trying to write a subroutine to return if file is Text or Binary
by GrandFather (Saint) on Jun 26, 2014 at 03:21 UTC

    What do you expect to happen if both -T and -B return false as would happen for a missing file?

    Are you interested in the case where both would return true as would happen for an empty file?

    How robust do you want the test to be? From the documentation:

    The -T and -B switches work as follows. The first block or so of the file is examined for odd characters such as strange control codes or characters with the high bit set. If too many strange characters (>30%) are found, it's a -B file; otherwise it's a -T file. Also, any file containing null in the first block is considered a binary file. If -T or -B is used on a filehandle, the current IO buffer is examined rather than the first block. Both -T and -B return true on a null file, or a file at EOF when testing a filehandle. Because you have to read a file to do the -T test, on most occasions you want to use a -f against the file first, as in next unless -f $file && -T $file.
    Perl is the programming world's equivalent of English
Re: Trying to write a subroutine to return if file is Text or Binary
by nevdka (Pilgrim) on Jun 26, 2014 at 05:13 UTC

    Hi csorrentini, welcome to the monastery.

    To me, it sounds like you have made a text file, and the text is 1's and 0's. Is this the case? If so, then it's not really a binary file, it's still a text file. I'm not sure how the binary flag is set (maybe a wiser monk could help...) but you could try pointing your sub at a JPG or video file.

    Also, you might want to read How do I post a question effectively?. Putting code in <code> tags helps readibility a lot.

Re: Trying to write a subroutine to return if file is Text or Binary
by DrHyde (Prior) on Jun 26, 2014 at 10:25 UTC

    I was playing around with this and expecting to post something clever about how -B doesn't mean binary - according to the documentation it means "not ASCII", which ain't the same as binary. However ...

    $ perl -v This is perl, v5.8.8 $ cat foo
    文本
    (this is apparently Chinese for "text") $ perl -e 'print -T "foo"' 1 $ perl -e 'print -B "foo"' $

    So even as far back as 5.8.8 it DTRT, although the documentation is wrong.

    Update: sorry for the bad formatting - perlmonks doesn't like me cutting and pasting funny foreign characters into a <code> block.

      Thank you all for the quick replies. After posting this I realized how I was completely overlooking the fact that it is still considered text even putting zeroes and ones in the file. Essentially all I wanted was it to return text file if sending something like a .txt doc or something similar and return binary file IF a binary file was sent to it and it works like a charm. Again, thanks all
        After posting I went to p5p and asked "WTF?". Apparently there is work going on (whether it'll be accepted and whether it'll be in 5.22 or not remains to be seen) to improve -T and -B. See here and here.
Re: Trying to write a subroutine to return if file is Text or Binary
by Laurent_R (Canon) on Jun 26, 2014 at 06:07 UTC
    Just in case this is not clear to you, a file containing ASCII representations of 0's and 1's is not a binary file, but a text file that will test true for the -T test.
Re: Trying to write a subroutine to return if file is Text or Binary
by pvaldes (Chaplain) on Jun 30, 2014 at 23:01 UTC
    You could find this node useful also.
    use File::LibMagic ':easy'; my $infile = $ARGV[0]; print $ARGV[0]," =", MagicFile($infile),"\n";