If only there was a Perl module that was HTML::Word available, because I know there is a HTML::PDF there.
Better would be WordProcessing::MSWord::Parse.
I recently used Perl to create an Excle file, wow, could not have been easier, so I'm really surprised there is nothing in Perl that can create Word documents.
Oh, there is some stuff for _creating_ Word documents,
but I skipped over it for two reasons: _creating_
documents isn't what you asked for (you wanted to
_read_ them and create something _else_ from them),
and the modules I saw were rather more specialized
than general (e.g., one of them was for creating
reports having something to do with DBI I think, in
Word document format). In general, creating documents
in a partially-understood format is easier than
parsing them, because for parsing you have to know
whatever aspect of the format that the document
happens to use. For generating documents, you just
have to figure out the basics, and then you can use
the regular means (e.g., Word) to create one that's
like what you want and simply copy large parts of it
without fully understanding them, substituting in
your custom content each time in place of the dummy
content from the initial document.
I guess a lot depends on how much of the format of M$ Word Microsoft will release
Unless I am greatly mistaken, most of what we know
about the Word document format does not come from
information that Microsoft has released.
then no-one is going to be able to create them (although isn't _that_ what OpenOffice can do ??)
OpenOffice inherited its Word input and output filters
from StarDivision, who created them the same way that
Corel did for the WordPerfect suite: by studying
documents that were created with Word and figuring
out what the different parts mean. The filters have
been refined over the years and are getting to be
quite good now, but there was some trial and error
that went into getting them right; it wasn't as
simple as reading a specification and implementing
it. I suspect that the source code for the Word
input and output filters built into OpenOffice is
probably the best extant documentation of the
Word document format outside of Microsoft. (Inside
of Microsoft there is the source code for Word, of
split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
Outside of code tags, you may need to use entities for some characters:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||