Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Parsing, searching, replacing content in MS Word documents

by Anonymous Monk
on Apr 23, 2002 at 04:50 UTC ( [id://161210]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am building a program that searches and replaces content recursively is a very large directory. The target files are html, asp, text, and MS Word docs. I can get everything to work except for the MS Word docs. Question: How do you search and replace content in MS Word documents using Perl OLE Automation for Word? An example would be super.

Replies are listed 'Best First'.
Re: Parsing, searching, replacing content in MS Word documents
by t0mas (Priest) on Apr 23, 2002 at 06:39 UTC
    You could try something like this:
    #!/usr/bin/perl -w # Uses use strict; use Win32::OLE; use Win32::OLE::Variant; use Win32::OLE::Const; use File::Find; # Variables use vars qw($MSWord $wd $startdir); # Where to start the doc search $startdir='c:\tmp'; # Create new MSWord object and load constants $MSWord=Win32::OLE->new('Word.Application','Quit') or die "Could not load MS Word"; $wd=Win32::OLE::Const->Load($MSWord); # Find documents find(\&rTxt,$startdir); sub rTxt { # We only want .doc files (no links...) return unless /\.doc$/ && -f && ! -l; # Open document my $doc = $MSWord->Documents->Open({FileName=>$File::Find::name}); # Exit nicely if we couldn't open doc return unless $doc; # Print some info print $File::Find::name,"\n"; # Content object my $content=$doc->Content; # Find object my $find=$content->Find; $find->ClearFormatting; $find->{Text}="Yoo"; $find->Replacement->ClearFormatting; $find->Replacement->{Text}="Hello"; $find->Execute({Replace=>$wd->{wdReplaceAll},Forward=>$wd->{True}} +); # Close document $doc->Close({SaveChanges=>$wd->{wdSaveChanges}}); }
    It uses the built-in Find-And-Replace function that comes with MS Word. If you want to do any other more complex things (like s/^Yoo/Hello/i) you'll have to do stuff to the whole Content object.
    The major problem with OLE Automation in my opinion is that it is painfully slow. If you have a large ammount of MS Word docs, this stunt will probably take ages.


    /brother t0mas

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://161210]
Approved by Kanji
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-03-29 14:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found