Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Converting Word97 (or later) exported HTML to valid HTML

by Corion (Pope)
on Nov 06, 2001 at 15:50 UTC ( #123553=note: print w/replies, xml ) Need Help??


in reply to Converting Word97 (or later) exported HTML to valid HTML

Honestly, as I read the title of your node, HTML tidy sprang immediately to my mind, as it even has command line switches used to specifically clean up Office HTML. On that website, there is also code on how to call HTML tidy from Perl, including some proposed error checking which seems mostly geared for Unix. On the second thought, it is not really clear why they use the code they use, so I'll post it here, together with my replacement :
## This is what I think is needed beforehand : open( TIDY, "html-tidy $commandline|") or die "Couldn't spawn html-tid +y : $!\n"; my @output; @output = <TIDY>; ## Here begins their code : if (close(TIDY) == 0) { my $exitcode = $? >> 8; if ($exitcode == 1) { printf STDERR "tidy issued warning messages\n"; } elsif ($exitcode == 2) { printf STDERR "tidy issued error messages\n"; } else { die "tidy exited with code: $exitcode\n"; } } else { printf STDERR "tidy detected no errors\n"; }
I think this could simply be done with the following code, but I haven't checked all possible outcomes...
my @output = qx(html-tidy $commandline); my $exitcode = $? >> 8; if ($exitcode == 1) { printf STDERR "tidy issued warning messages\n"; } elsif ($exitcode == 2) { printf STDERR "tidy issued error messages\n"; } else { die "tidy exited with code: $exitcode\n"; }

Wrapping it up, unless you tell us a really convincing reason why html-tidy is not possible (and with not possible I also mean putting html-tidy into a Perl script, writing it out to /tmp, starting it there and afterwards deleting the file again), I'll stick with this solution :-)

perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://123553]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2021-06-12 20:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (53 votes). Check out past polls.

    Notices?