Say, from where did you get that Html-Table you posted? Exported it using Word? Regardless, please don't do so again. I had quite a fight to convert it to sane Html.

holli, /regexed monk/

by demerphq (Chancellor) on Jan 19, 2006 at 16:07 UTC

    Do a google search for demoronizer.


      Sounds very useful. Fine vintage too:

             demoroniser was developed using Perl 4.0, patch level  36.


      Heh. I saw your comment above, and thought:

      "How mean, demerphq is calling this poor original poster a moron!"

      I prepared to downvote your node (ah, the awesome power I felt!) and thought, "Why don't I check Google, first?"

      It turns out that you didn't coin that name, and that the author of the tool is (apparently) poking fun at MicroSoft (always fair game IMO, comes with the territory) rather than specifically at the OP: Demoronizer

      Moral to self: Look before you leap ... to conclusions. :)

by ww (Archbishop) on Jan 20, 2006 at 16:07 UTC

    There are several twisty corridors here in the Monastery in which demoronizer cobwebs hang from the ceiling; IMO they're well worth pursuing by anyone interested in cleaning up the .html produced by ANY of MS's Word, Excel or supposedly WYSIWYG products. Look under the covers, and what you got was remarkable bloat and non-conformant code.

    So, a few keywords for future Super_Searchers: "HTML, html MS, Microsoft, Office, Word, Excel, FrontPage, PowerPoint, Publisher, cleanup, parse" ...and there surely could be more (arguably even Notepad, which when in word-wrap mode adds MS-ish lineends at every displayed wrap position).

    davidrw and astroboy offered links to useful alternate tools in Word HTML issues. There also a bit of discussion re the issues implied in samtregar's remark in this thread.

    Self-updating of demoronizer is laid out very nicely by derby in Re^3: Reg Ex to strip MS smart quotes

    But (... sigh! )...even the the lastest Word->html output does not exactly demonstrate that the allegedly-enlightened giant in Redmond has learned to avoid making the same mistakes in different (ie, incompatible) ways.

    ...and, oh yes, a (deprecated) disclaimer: I don't hate W32; I just hate cleaning up MS .html to w3c standards.

    Fair warning, also: I should probably use a sig like html 4.01 dinosaur