http://www.perlmonks.org?node_id=985137

wakatana has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I am new here and also begginer in perl. I would like to ask several questions, some related to perl and its syntax but most will be regarding to WIN32 OLE. My main goal is to develop script that will check word document structure (return some information) and make some changes in this document (if it is possible). I am not shure if all that can be done throught perl and OLE. First I am sorry for posting things regarding microsoft OLE but mostly things that I have found was from this site, I also tried MSDN but seems very unclear and useless to me. Latest experimienting with OLE drives me crazy so I hope somebody could help.
1st
===
How to properly start win32 OLE ? I found somewhere this approach:
my $word = Win32::OLE->GetActiveObject('Word.Application') || Win32::OLE->new('Word.Application','Quit') or die Win32::OLE->LastError();
Seems clear, just few questions to enshure that I understand:
1. Can i replace or with || both stands for logical or ?
2. What is really going on in this code:
The perl tries to capture some of running instances of word application, if fails then starts it own instance and if this fails the error is printed ? What would happend if it captures some existing instance, and how to create this instance, it is just started word process ?
3. I read in OLE documentation that second argument is destructor (but seems that is not mandatory), what is really its purpose ? i know it is opposite of constructor but, there needs to be some method created with name 'Quit' or it is done automatically or what is going on ?
4. What does those "::" and "->" in code above stands for ? Is it some accessing of methods in some package or class or whatever ?
5. 'use warnings;' is the same as perl -w ?
2nd
===
Some codes that I've found on internet regarding OLE cointinues with following:
my $doc = $word->Documents->Open('C:\\Perl\\home\\001f.doc');
but then also found some weird structures, such as:
my $doc = $word->Documents->Open( { FileName => 'C:\\Perl\\home\\001f. +doc', ReadOnly =>1 }) or die Win32::OLE->LastError(); SaveAs({FileName => 'exampletext.doc', FileFormat => wdFormatDocument +,}) $doc->Close( { SaveChanges => $wdc->{wdDoNotSaveChanges} } );
First approach seems pretty clear, accessing method Open which is part of Document (class, package or whatever) but what does other do ?
What does mean {FileName => 'C:\\Perl\\home\\001f.doc', ReadOnly =>1} in fcunction arguments, why curly braces ?
What is the difference between '=>' and '->' ? Is closing properties (or what is correct name) in '{}' necessary ? I have found that VBA has nearly simillar syntax 'ComputeStatistics(Statistic:=wdStatisticWords)' I am assuming this thing relates each other because in fact with OLE I am using microsoft technologies from perl. Also found solution which works with same functions/parameters as VBA (but without this strange assigment) here it is: http://www.perlmonks.org/?parent=334960;node_id=3333
3rd
===
On this site http://www.perlmonks.org/index.pl?node_id=422677 is mentioned that M$ word does not have information about number of pages contained in documet. So everytime when OLE is used the document is opened and page count is recalculated according to styles font size etc. Seems according M$ this is no problem http://support.microsoft.com/kb/185513 http://support.microsoft.com/kb/185509, or the page count is also recalculated during file openning ? Also found two solutions for perl which are working using wdPropertyPages - http://stackoverflow.com/questions/820348/why-are-the-number-of-pages-in-a-word-document-different-in-perl-and-word-vba and using wdStatisticPages - http://www.perlmonks.org/?parent=334960;node_id=3333 So where is the truth is there infomration about page count or not.
4th
===
As I mentioned it seems that perl, VBA and PowerShell codes that I've found have several in common (I am not exper neither of those languages but they are acessing simillar variables). Following page describes how to obtain number of words and number of pages from document. As one of user suggested it can be obtained with '$selection->Words->{Count};' and '$selection->pagenumbers->{Count};' construction. However if I search word 'selection' in Object Browser of M$ Visual basic (from word document hit alt+F11 and F2) I found following:
Class Selection Member of Word ------------------------------------ Property Words As Words read-only Member of Word.Selection Property Characters As Characters read-only Member of Word.Selection ========================== Sub ShrinkDiscontiguousSelection() Member of Word.Selection ------------------------------------ Property Words As Words read-only Member of Word.Selection Property Characters As Characters read-only Member of Word.Selection
As you can see both contains also variables (or properties or what is the correct name, please correct) for 'Characters' and 'Words' but seems that both 'Characters' and 'Words' are member of 'Word.Selection' how should I understand that? Also I tried to search for 'pagenumbers' as it was mentioned in above link but did not find anything except several 'wdPageNumberStyle' and 'PageNumbers' but not 'pagenumbers' (lowercase). Also I did not find in Object Browser that 'Word.Selection.Words' or 'Word.Selection.Characters' have 'Count' method (or property what is correct name) where this method (property) came from ? What does word 'as' means in above output it is some data type ? Here I am posting mentioned code which I slightly altered
#!/usr/bin/perl use Cwd 'abs_path'; use warnings; use strict; use Win32::OLE 'CP_UTF8'; $Win32::OLE::CP = CP_UTF8; binmode STDOUT, 'encoding(utf8)'; print abs_path($0) . "\n"; print "=========\n"; my $document_name = 'C:\\Perl\\home\\thisIsPerl.doc'; my $word = Win32::OLE->GetActiveObject('Word.Application') || Win32::OLE->new('Word.Application') or die Win32::OLE->LastError(); $word-> {visible} = 0; $word->Application->Selection; my $document = $word->Documents->Open( { FileName => $document_name, R +eadOnly =>1 }) or die Win32::OLE->LastError(); my $paragraphs = $document->Paragraphs (); my $n_paragraphs = $paragraphs->Count (); print "Words:", $word->Selection->Words->{Count}, "\n"; print "Characters:", $word->Selection->Characters->{Count}, "\n"; print "Paragraphs: ", $word->Selection->Paragraphs->{Count}, "\n"; $document->Close(); $word->exit; $word->Quit; Administrator@cepido /cygdrive/c/Perl/home $ ./internet04_pgcnt.pl /cygdrive/c/Perl/home/internet04_pgcnt.pl ========= Words:1 Characters:1 Paragraphs: 1
but this code did not works perfectly. It always returns word count 1 no matter how many word are in document. Those investigations points me to another probably most important question, how are all those OLE objects organized ? The object browser is unclear to me, I also downloaded OLE/COM Object Viewer but bad luck also. I know this is not standard question to perl but I dont know where to ask. One idea which commes to mind is to list somehow all methods (properities variables packages) which are included in OLE throught perl, and then just try several of them according name, is this possible ?
5th
===
Is possible to process word document character by character ? Or even better is possible to query data from word like if it is SQL? Simply say select * from document where Font=Italic ? I think that reading by words I have done here ()there are some little mistakes:
#!/usr/bin/perl -w use strict; use warnings; use Win32::OLE::Const 'Microsoft Word'; my $file = 'C:\\Perl\\home\\thisIsPerl.doc'; my $Word = Win32::OLE->new('Word.Application', 'Quit'); $Word->{'Visible'} = 0; my $doc = $Word->Documents->Open($file); my $paragraphs = $doc->Paragraphs() ; my $n_paragraphs = $paragraphs->Count (); for my $p (1..$n_paragraphs) { my $paragraph = $paragraphs->Item ($p); my $words = Win32::OLE::Enum->new( $paragraph->{Range}->{Words} ); while ( defined ( my $word = $words->Next() ) ) { my $font = $word->{Font}; print "IN_Text:", $word->{Text}, "\n" if $word->{Text} !~ /\r/ +; #print $text; #$font->{Bold} = 1 if $word->{Text} =~ /Perl/; } print "=============\n"; } $Word->ActiveDocument->Close ; $Word->exit; $Word->Quit;
Works but throws some error at the end and did not proceed headers and footers
6th
===
I searched found following in Object Browser: Const wdNumberOfPagesInDocument = 4 Member of Word.WdInformation Const wdStatisticPages = 2 Member of Word.WdStatistic What does mean thoe numbers ? I am shure they do not coresponds with actual number of word document pages (I was playing with code from which works http://www.perlmonks.org/?parent=334960;node_id=3333)
7th
===
Finally last question, I read somewhere that full path is necessary in OLE to open word document. I would like to pass document to procesing as an argumen to script but without needing specify full path (whole path should be appended to it after it will be passed to script) found somewhere that 'abs_path($0)' is using to doing someting similar but I had no luck. Also on Windows the slashes must be escaped and so on.
I am sorry for longer post but I am stuck at points that I've described, hope somebody knows answer.
Thanks a lot for any idea
PS: where to start new post, i was able to post this because of new page appers after my account registration

Replies are listed 'Best First'.
Re: win32 OLE in deeper details
by ig (Vicar) on Aug 03, 2012 at 05:57 UTC

    Your questions cover a very wide range of concerns, from basic Perl syntax to some quite complex issues of the interface between perl and Windows/Office. While I have used Win32::OLE, I have never used it to access office documents, so I can't give you much advice on your primary objective.

    It will serve you well to study an introduction to Perl programming, to learn basic syntax, data structures, etc. This will help you with questions like "What does those "::" and "->" in code above stands for ?", "why curly braces ?", etc. There are some excellent tutorials and references in Getting Started with Perl. It will be difficult for you to understand the code you see and reliably deal with objects, variants (Win32::OLE::Variant) and data structures until you have learned the basics.

    While I found much of the documentation available on MSDN difficult to unhelpful, the script examples I found very helpful. When I was doing it regularly, I was able to translate VBScript examples to Perl quite reliably. And the MSDN docs did at least identify many of the methods and attributes of the COM objects. My usual route to success was to hunt through MSDN for objects, methods and attributes that looked interesting, then google for examples of them being used to see how they worked.

      I prefer to use oleview.exe, then try to find docs for the method on MSDN.
Re: win32 OLE in deeper details
by rohit_raghu (Acolyte) on Aug 03, 2012 at 06:23 UTC
    wakatana,

    I'm a beginner in perl myself, but I can answer some of the questions you've asked.

    First off, there are several online tutorials on perl. Try this link before starting, or try downloading the relative chapters of 'Beginning Perl' from the perl website. Also, try the search bar first before asking such a huge question.

    || and the or operator both represent he same thing, only || has a a higher priority than or. If the code before the || returns a true value, i.e. is executed properly, then any thing after the || is ignored. Otherwise, it is executed. Try the piece of code below.

    #!/usr/bin/perl 0 or print "0:false\n"; 1 or print "1:false\n"; 0 or die "Error\n"; print"Not executed";

    2nd, -> and => are completely unrelated. See hashes and references. Also look at Object oriented programming in perl(OOP), and modules. Do NOT try programming in perl without first understanding hashes.

    Scroll to the bottom of the SoPW page to post a question

    Rohit Raghunathan
Re: win32 OLE in deeper details
by Anonymous Monk on Aug 03, 2012 at 08:34 UTC