http://www.perlmonks.org?node_id=123571


in reply to Converting Word97 (or later) exported HTML to valid HTML

You're so right - it's really quite horrendous. I've used two solutions for this in the past (neither Perl though, sorry) : The second of these obviously can't be incorporated in a script, the first probably can't, but perhaps you could persuade your users to run their html files through the Microsoft utility, on their Windows desktop?

hth a little,
andy.

  • Comment on Re: Converting Word97 (or later) exported HTML to valid HTML

Replies are listed 'Best First'.
Re: Re: Converting Word97 (or later) exported HTML to valid HTML
by impossiblerobot (Deacon) on Nov 06, 2001 at 20:44 UTC
    I've found a Word filter from Microsoft that is supposed to output cleaner HTML. (I assume this is what you were talking about.)

    I also tend to use Dreamweaver for this task, but it does leave some of the CSS stuff behind, so some cleanup is still required.

    Update: Although I still haven't tested the output, it appears that the MS Word filter can be used from the command line, as a standalone GUI application, or from within Word, and can batch process multiple files.


    Impossible Robot