One simple-minded, OTTOMH approach (unless MS Word's formatting is somehow important):
- in Word 2010, edit in an end_of_record marker of any flavor you like, so long as it won't appear in the test.
- save the whole (edited) .doc as .txt
- read the .txt
- split on EOR to create an array (named for page number) per page of words.
- split each array's contents on spaces to a second-level array (named for the page number from which the first array was extracted) of individual words
- index to your heart's content...
Of course, this may not be the most efficient approach, but it certainly avoids "climbing that (Word object model) curve.
If you didn't program your executable by toggling in binary, it wasn't really programming!
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
Outside of code tags, you may need to use entities for some characters:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||