Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: (Get text of Word Document)going through a Win32 MSWORD doc

by buzzcutbuddha (Chaplain)
on Dec 20, 2001 at 22:28 UTC ( #133555=note: print w/ replies, xml ) Need Help??


in reply to going through a Win32 MSWORD doc

If what you meant by indexing is grabbing each word in a Word Document and creating an index of those words to find them later, the following will give you all of the words in a document. I'll let you focus on the indexing part. :)

#!/usr/bin/perl # general use directives use strict; use warnings; # project specific use directives # this comes with the standard ActiveState # distribution. You can also look for # a newer version with PPM use Win32::OLE; my $wd; # get the document # use the full path eval { $wd = Win32::OLE->GetObject('C:/pathto/document/foo.doc') }; die "Unable to load document\n" if $@; # all of the Word document data members I'm using # are explained in the MSDN documentation of the # external interfaces of a Word Document. # if you have MSDN, search for "Word OLE". # get the number of paragraphs my $paraCount = $wd->{Paragraphs}->Count; # set the counter my $foo = 0; my @words; while ($foo++ < $paraCount) { push @words, split /\s/, $wd->{Paragraphs}{$foo}{Range}{Text}; } #clean up at the end undef $wd;
That's how you get the words of a word document out and into an array. You may prefer a different data structure, but again, I'll leave that up to you! I hope this helps.


Comment on Re: (Get text of Word Document)going through a Win32 MSWORD doc
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://133555]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2014-12-22 04:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (110 votes), past polls