Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Honestly, as I read the title of your node, HTML tidy sprang immediately to my mind, as it even has command line switches used to specifically clean up Office HTML. On that website, there is also code on how to call HTML tidy from Perl, including some proposed error checking which seems mostly geared for Unix. On the second thought, it is not really clear why they use the code they use, so I'll post it here, together with my replacement :
## This is what I think is needed beforehand : open( TIDY, "html-tidy $commandline|") or die "Couldn't spawn html-tid +y : $!\n"; my @output; @output = <TIDY>; ## Here begins their code : if (close(TIDY) == 0) { my $exitcode = $? >> 8; if ($exitcode == 1) { printf STDERR "tidy issued warning messages\n"; } elsif ($exitcode == 2) { printf STDERR "tidy issued error messages\n"; } else { die "tidy exited with code: $exitcode\n"; } } else { printf STDERR "tidy detected no errors\n"; }
I think this could simply be done with the following code, but I haven't checked all possible outcomes...
my @output = qx(html-tidy $commandline); my $exitcode = $? >> 8; if ($exitcode == 1) { printf STDERR "tidy issued warning messages\n"; } elsif ($exitcode == 2) { printf STDERR "tidy issued error messages\n"; } else { die "tidy exited with code: $exitcode\n"; }

Wrapping it up, unless you tell us a really convincing reason why html-tidy is not possible (and with not possible I also mean putting html-tidy into a Perl script, writing it out to /tmp, starting it there and afterwards deleting the file again), I'll stick with this solution :-)

perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web

In reply to Re: Converting Word97 (or later) exported HTML to valid HTML by Corion
in thread Converting Word97 (or later) exported HTML to valid HTML by projekt21

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2021-06-12 21:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (53 votes). Check out past polls.

    Notices?