Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Need help with perl only parsing of M$ word file

by Brutha (Friar)
on Sep 05, 2003 at 08:18 UTC ( [id://289132]=note: print w/replies, xml ) Need Help??


in reply to Need help with perl only parsing of M$ word file

Dennis,

I scan a directory tree of word files for creation of an index with SWISH-E.

I use Win32::OLE and have M$-Word installed, but I do not need any interaction and Word does not have to be visible. This is bound to windows. Are you dependend on the windows platform? All tools I found were not exactly what I need, many come from the unix world and depend on these handy gnu libraries, but I am on Windows here.

Be aware, that after extracting the text you might still have lots of control characters forming tables etc. I am not interested in bold or italic text, but extract title and other document properties, user-defined properties and text.

My solution was straight forward as with every OLE interaction I have written in Perl. You open the application and the macro editor, press F1 to find the functions, record macros, save the VB-Script and translate and extend it to Perl, cutting its length to the half.

If somebody is interested, I could post my code as a starting point.

regards Brutha

And it came to pass that in time the Great God Om spake unto Brutha, the Chosen One: "Psst!"
(Terry Pratchett, Small Gods)

  • Comment on Re: Need help with perl only parsing of M$ word file

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://289132]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-26 04:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found