Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: How to recognize Word and XLS files

by jmcnamara (Monsignor)
on Mar 06, 2012 at 09:46 UTC ( #958047=note: print w/ replies, xml ) Need Help??


in reply to How to recognize Word and XLS files

Here is a small program that uses OLE::Storage_Lite to distinguish distinguish Microsoft doc and xls files.

#!/usr/bin/perl use strict; use warnings; use OLE::Storage_Lite; my @files = ( 'test.xls', 'test.doc', 'test.ppt', 'test.txt', ); for my $filename ( @files ) { printf( "%-20s = %s\n", $filename, check_ole_filetype( $filena +me ) ); } sub check_ole_filetype { my $filename = shift; # Check that the file exists. return 'not_found' if !-e $filename; # Create an OLE::Storage_Lite object to read the file. my $ole = OLE::Storage_Lite->new( $filename ); my $pps = $ole->getPpsTree(); # If getPpsTree() failed then this isn't an OLE file. return 'not_ole_file' if !$pps; # Loop through the PPS children below the root. for my $child_pps ( @{ $pps->{Child} } ) { my $pps_name = OLE::Storage_Lite::Ucs2Asc( $child_pps->{Na +me} ); # Match an Excel xls file. if ( $pps_name eq 'Workbook' || $pps_name eq 'Book' ) { return 'xls'; } # Match a Word document. if ( $pps_name eq 'WordDocument') { return 'doc'; } } return 'unknown_ole_file'; } __END__ Output: $ perl ole_check.pl test.xls = xls test.doc = doc test.ppt = unknown_ole_file test.txt = not_ole_file

You will probably have to harden it a little for your needs. For example it is possible that some older Word files might have a differed $pps_name. A little testing should highlight if that is the case. Also, this won't find Office 2007+ style docx or xlsx files.

--
John.


Comment on Re: How to recognize Word and XLS files
Download Code
Re^2: How to recognize Word and XLS files
by Anonymous Monk on Mar 06, 2012 at 10:12 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://958047]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2014-10-02 11:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (54 votes), past polls