Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Win32 OLE Word Get Page Text

by cormanaz (Chaplain)
on Feb 19, 2013 at 18:24 UTC ( #1019638=perlquestion: print w/ replies, xml ) Need Help??
cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Good day, Bros. I am preparing an index of a book manuscript done in Word (2010) and I want to write a script that will grab the text from each page separately. After some searching around I can only find code examples that do things like print the document, change margins, etc. Can anyone point me in the right direction? I've done quite a bit with OLE and Outlook and Excel, but I don't know the Word object model and would prefer to avoid climbing that learning curve if possible.

Comment on Win32 OLE Word Get Page Text
Re: Win32 OLE Word Get Page Text
by ww (Bishop) on Feb 19, 2013 at 18:40 UTC

    One simple-minded, OTTOMH approach (unless MS Word's formatting is somehow important):

    1. in Word 2010, edit in an end_of_record marker of any flavor you like, so long as it won't appear in the test.
    2. save the whole (edited) .doc as .txt
    3. read the .txt
    4. split on EOR to create an array (named for page number) per page of words.
    5. split each array's contents on spaces to a second-level array (named for the page number from which the first array was extracted) of individual words
    6. index to your heart's content...

    Of course, this may not be the most efficient approach, but it certainly avoids "climbing that (Word object model) curve.


    If you didn't program your executable by toggling in binary, it wasn't really programming!

Re: Win32 OLE Word Get Page Text
by nikosv (Hermit) on Feb 19, 2013 at 22:07 UTC
    use OLE Viewer to check the Type libraries and get an understanding of the automation interfaces provided by the Word object
    Using the OLE/COM Object Viewer

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1019638]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (7)
As of 2014-10-21 05:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (96 votes), past polls