Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Parsing, searching, replacing content in MS Word documents

by Anonymous Monk
on Apr 23, 2002 at 04:50 UTC ( #161210=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am building a program that searches and replaces content recursively is a very large directory. The target files are html, asp, text, and MS Word docs. I can get everything to work except for the MS Word docs. Question: How do you search and replace content in MS Word documents using Perl OLE Automation for Word? An example would be super.

Comment on Parsing, searching, replacing content in MS Word documents
Re: Parsing, searching, replacing content in MS Word documents
by t0mas (Priest) on Apr 23, 2002 at 06:39 UTC
    You could try something like this:
    #!/usr/bin/perl -w # Uses use strict; use Win32::OLE; use Win32::OLE::Variant; use Win32::OLE::Const; use File::Find; # Variables use vars qw($MSWord $wd $startdir); # Where to start the doc search $startdir='c:\tmp'; # Create new MSWord object and load constants $MSWord=Win32::OLE->new('Word.Application','Quit') or die "Could not load MS Word"; $wd=Win32::OLE::Const->Load($MSWord); # Find documents find(\&rTxt,$startdir); sub rTxt { # We only want .doc files (no links...) return unless /\.doc$/ && -f && ! -l; # Open document my $doc = $MSWord->Documents->Open({FileName=>$File::Find::name}); # Exit nicely if we couldn't open doc return unless $doc; # Print some info print $File::Find::name,"\n"; # Content object my $content=$doc->Content; # Find object my $find=$content->Find; $find->ClearFormatting; $find->{Text}="Yoo"; $find->Replacement->ClearFormatting; $find->Replacement->{Text}="Hello"; $find->Execute({Replace=>$wd->{wdReplaceAll},Forward=>$wd->{True}} +); # Close document $doc->Close({SaveChanges=>$wd->{wdSaveChanges}}); }
    It uses the built-in Find-And-Replace function that comes with MS Word. If you want to do any other more complex things (like s/^Yoo/Hello/i) you'll have to do stuff to the whole Content object.
    The major problem with OLE Automation in my opinion is that it is painfully slow. If you have a large ammount of MS Word docs, this stunt will probably take ages.

    /brother t0mas

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://161210]
Approved by Kanji
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2015-04-18 09:19 GMT
Find Nodes?
    Voting Booth?

    Who makes your decisions?

    Results (351 votes), past polls