Detecting 'binary' in a variable

by kirbyk (Friar)
on Jul 05, 2005 at 17:39 UTC
kirbyk has asked for the wisdom of the Perl Monks concerning the following question:

I have an application running under Apache/Mod-perl where a user can upload a csv file. The file gets uploaded, and sits in a perl variable, eventually to be loaded into an Oracle CLOB.

I want to detect if the file they've uploaded is binary or text only, and give the user a helpful error message. (Like, if they upload a .xls file.) Note that the file never exists on a filesystem, so I can't use any unix tricks (and I don't want to write out a temp file.)

I figure I can go character-by-character in a loop and look at the ascii values, but that seems horribly inefficient. Is there a quick regex that could do this check? I'm not worried about Unicode characters, but it'd be nice if extended ascii characters through, say, 165 (to get all the accented characters.)

Re: Detecting 'binary' in a variable
by Transient (Hermit) on Jul 05, 2005 at 17:49 UTC
    would a simple if ( $file =~ /[^\x00-\xA5]/ ) { # binary } else { #text } suffice?

    Update: Also looks like there's a CGI::UploadEasy method "fileinfo" (in case you're using or could use that module)

      Not always

      $ perl -le '{local$/; $_=<>;}print /^[\x00-\xA5]/ ? "binary" : "text" +' \ /mnt/win/WINDOWS/system32/ text
        That's correct, but that's not the same regexp:

      Thanks, that regex does the trick.

Re: Detecting 'binary' in a variable
by brian_d_foy (Abbot) on Jul 05, 2005 at 18:01 UTC

    In Perl 5.8, you can open a virtual filehandle on a scalar reference. That might do the trick for you. If you are using an older perl, Tie::Handle::ToMemory does the same thing. You could then use the file test operators, or something like File::Magic. If that doesn't work for you, you can try to match a specific signature for an Excel file (or whatever you might get) with what you see in the uploaded data, but that's a lot more work.

    Good luck!

