Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Re: Word HTML issues

by Corion (Pope)
on May 15, 2005 at 19:57 UTC ( #457281=note: print w/replies, xml ) Need Help??

in reply to Word HTML issues

Although I haven't used it, there is the Demoronizer, which purports to clean up the HTML generated by Word. I'm not sure whether it will help you. You could also disallow pasting Word stuff, because I'm not sure how HTMLArea3 handles pasted Word documents, as it doesn't have access to the special Word formatting. You could consider having your users paste or upload RTF, and then convert the RTF to proper HTML.

Replies are listed 'Best First'.
Re^2: Word HTML issues
by ww (Archbishop) on May 15, 2005 at 22:05 UTC
    Unfortunately, Demoronizer worked better on the html generated by the version M$Word which was current when Demoronizer (Oh, I love that name) was written than it does on the output from more recent Word versions; the newer ones use all manner of new and sometimes unpleasant, non-standard html (or, more recently, XML, which also tends to be unpleasant to try to convert).

    Corion's advice to have your users to provide RTF (or even, plain text) for conversion should work better than (the latest version I've found) of Demoronizer... and I even took at whack at updating it to deal with additional versions of what Word claims is .html.

    However, I see other recommendations for cleanup below... and I, for one, am going to check them out. You may find them valuable (and easier) than either Demoronizer or than learning enough (standards complaint) .html to convert .txt or .rtf.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://457281]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (7)
As of 2021-01-27 17:01 GMT
Find Nodes?
    Voting Booth?