in reply to Win32 OLE Word Get Page Text
One simple-minded, OTTOMH approach (unless MS Word's formatting is somehow important):
- in Word 2010, edit in an end_of_record marker of any flavor you like, so long as it won't appear in the test.
- save the whole (edited) .doc as .txt
- read the .txt
- split on EOR to create an array (named for page number) per page of words.
- split each array's contents on spaces to a second-level array (named for the page number from which the first array was extracted) of individual words
- index to your heart's content...
Of course, this may not be the most efficient approach, but it certainly avoids "climbing that (Word object model) curve.
|
---|
In Section
Seekers of Perl Wisdom