Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Accessing Meta data from MS WORD

by ghouse_55 (Initiate)
on Aug 07, 2012 at 09:04 UTC ( #985923=perlquestion: print w/replies, xml ) Need Help??
ghouse_55 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Need some info on accessing MS word meta data information of a document via Perl Is this possible via Perl & can you give me some guidance on how to go about this

Replies are listed 'Best First'.
Re: Accessing Meta data from MS WORD
by thmsdrew (Scribe) on Aug 07, 2012 at 10:46 UTC

    Well a .docx file is actually just an archive file containing the metadata that you speak of. In Perl it is possible to access an archive file, extract the metadata (which would be a .xml file), and then you can parse the .xml file for what you need. These tasks are accomplished with specific Perl modules that can be found on CPAN.

      ... assuming of course that the file in question is in Microsoft's OpenXML format. Older versions of Word used the proprietary binary ".doc" format, which is still quite frequently used.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: Accessing Meta data from MS WORD
by sundialsvc4 (Abbot) on Aug 07, 2012 at 12:58 UTC

    /me nods...

    IIRC, docx is an XML-formatted file with a well-known public schema, zip-compressed.   If you do not already find a CPAN module to do what you want, an approach could be to write code that unzips it, then attacks the XML content using XPath expressions ... thus avoiding the need to write code to match the XML internal structure.   But it is extremely likely that what you are doing is “a thing already done.”

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://985923]
Approved by Ratazong
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2017-12-15 20:36 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (443 votes). Check out past polls.