Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Converting M$ Word --> PDF

by jonadab (Parson)
on Jan 24, 2004 at 02:25 UTC ( #323767=note: print w/ replies, xml ) Need Help??


in reply to Converting M$ Word --> PDF

I need to convert a M$ Word document

Wow, it's hard to find something for that on CPAN. The terms "Microsoft", "word", and "document" all occur in the documentation for approximately every single module EVER, making it totally impossible to use them as search criteria. The only thing I managed to find that seems relevant at all is docclient.

Failing the existence on CPAN of a module just for reading Word documents, I tend to agree with the guy who advised you to get OpenOffice and ooolib; though I haven't used ooolib yet personally, I know that OpenOffice generally does as excellent a job with Word documents as can be hoped for, given the immense complexity and extremely poor documentation for that format.

Ideally, I would like to simply run a Perl script on the Linux box

That shouldn't be a problem. Install OpenOffice on the Linux box; you already have Perl there, of course. That leaves ooolib, which according to the sourceforge project page runs on Linux. I've not used ooolib myself, though, since I usually write scripts that work with the XML; I don't have to deal with Word documents much. But now that I know ooolib exists, I'm making myself a note to check it out soon; it could be quite useful :-)


$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/


Comment on Re: Converting M$ Word --> PDF
Download Code
Re: Re: Converting M$ Word --> PDF
by peterr (Scribe) on Jan 24, 2004 at 03:46 UTC
    The only thing I managed to find that seems relevant at all is docclient.

    Checked it out, the only thing that might be hard is "On the server machine, a Docserver application (usually docserver.pl program) has to be running."

    Will see how the OpenOffice and ooolib 'combo' goes.

    That shouldn't be a problem. Install OpenOffice on the Linux box; you already have Perl there, of course. That leaves ooolib, which according to the sourceforge project page runs on Linux. I've not used ooolib myself, though, since I usually write scripts that work with the XML; I don't have to deal with Word documents much. But now that I know ooolib exists, I'm making myself a note to check it out soon; it could be quite useful :-)

    If only there was a Perl module that was HTML::Word available, because I know there is a HTML::PDF there. I recently used Perl to create an Excle file, wow, could not have been easier, so I'm really surprised there is nothing in Perl that can create Word documents. (But then, even Clipper can create Excel files). I guess a lot depends on how much of the format of M$ Word Microsoft will release, because having made the comments on Excel, I do know the complete layout of Excel was available some years back. The bottom line I guess is, if M$ haven't released ALL the info on the structure of M$ Word files, then no-one is going to be able to create them (although isn't _that_ what OpenOffice can do ??)

    Peter

      If only there was a Perl module that was HTML::Word available, because I know there is a HTML::PDF there.

      Better would be WordProcessing::MSWord::Parse.

      I recently used Perl to create an Excle file, wow, could not have been easier, so I'm really surprised there is nothing in Perl that can create Word documents.

      Oh, there is some stuff for _creating_ Word documents, but I skipped over it for two reasons: _creating_ documents isn't what you asked for (you wanted to _read_ them and create something _else_ from them), and the modules I saw were rather more specialized than general (e.g., one of them was for creating reports having something to do with DBI I think, in Word document format). In general, creating documents in a partially-understood format is easier than parsing them, because for parsing you have to know whatever aspect of the format that the document happens to use. For generating documents, you just have to figure out the basics, and then you can use the regular means (e.g., Word) to create one that's like what you want and simply copy large parts of it without fully understanding them, substituting in your custom content each time in place of the dummy content from the initial document.

      I guess a lot depends on how much of the format of M$ Word Microsoft will release

      Unless I am greatly mistaken, most of what we know about the Word document format does not come from information that Microsoft has released.

      then no-one is going to be able to create them (although isn't _that_ what OpenOffice can do ??)

      OpenOffice inherited its Word input and output filters from StarDivision, who created them the same way that Corel did for the WordPerfect suite: by studying documents that were created with Word and figuring out what the different parts mean. The filters have been refined over the years and are getting to be quite good now, but there was some trial and error that went into getting them right; it wasn't as simple as reading a specification and implementing it. I suspect that the source code for the Word input and output filters built into OpenOffice is probably the best extant documentation of the Word document format outside of Microsoft. (Inside of Microsoft there is the source code for Word, of course.)


      $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://323767]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (10)
As of 2014-09-17 20:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (99 votes), past polls