Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Re: (Get text of Word Document)going through a Win32 MSWORD doc

by buzzcutbuddha (Chaplain)
on Dec 20, 2001 at 22:28 UTC ( #133555=note: print w/replies, xml ) Need Help??

in reply to going through a Win32 MSWORD doc

If what you meant by indexing is grabbing each word in a Word Document and creating an index of those words to find them later, the following will give you all of the words in a document. I'll let you focus on the indexing part. :)

#!/usr/bin/perl # general use directives use strict; use warnings; # project specific use directives # this comes with the standard ActiveState # distribution. You can also look for # a newer version with PPM use Win32::OLE; my $wd; # get the document # use the full path eval { $wd = Win32::OLE->GetObject('C:/pathto/document/foo.doc') }; die "Unable to load document\n" if $@; # all of the Word document data members I'm using # are explained in the MSDN documentation of the # external interfaces of a Word Document. # if you have MSDN, search for "Word OLE". # get the number of paragraphs my $paraCount = $wd->{Paragraphs}->Count; # set the counter my $foo = 0; my @words; while ($foo++ < $paraCount) { push @words, split /\s/, $wd->{Paragraphs}{$foo}{Range}{Text}; } #clean up at the end undef $wd;
That's how you get the words of a word document out and into an array. You may prefer a different data structure, but again, I'll leave that up to you! I hope this helps.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://133555]
[james28909]: consolidate the three subs into one
[Lady_Aleena]: Um, what?
[james28909]: check is is data or hash or array and do tasks then return needed data
[Lady_Aleena]: james28909, you might want to look at the other two on my scratchpad.
[james28909]: after you send to a sub, you can check if it is array or ref ect with ref
[james28909]: check if it is a hash or an array ect with ref in one sub.
[james28909]: like all the subs calling get_data get_array get_ect, you could just use get_data. once you send the data to get_data, check if it is a hash or an array ect. and do functions for each, in one sub
[james28909]: that looks like it would be alot more work than just renaming the sub though lol

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2017-05-24 04:54 GMT
Find Nodes?
    Voting Booth?