Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re^2: Dynamically cleaning up HTML fragments

by SilasTheMonk (Chaplain)
on Sep 24, 2010 at 11:41 UTC ( #861793=note: print w/replies, xml ) Need Help??

in reply to Re: Dynamically cleaning up HTML fragments
in thread Dynamically cleaning up HTML fragments

Actually HTML::Tidy seems to have a bit of bad history at Debian. My original post that it is not in Debian is wrong, but its definitely in an odd state. I am investigating.
  • Comment on Re^2: Dynamically cleaning up HTML fragments

Replies are listed 'Best First'.
Re^3: Dynamically cleaning up HTML fragments
by wfsp (Abbot) on Sep 25, 2010 at 10:32 UTC
    Ubuntu 8.04, perl 5.10.1

    HTML::Tidy has been released three times this year (the last on 17 September) so some of the criticisms may have been addressed.

    It requires tidyp (version 1.04 recently released) which is a fork of tidy.

    I was able to install tidyp in the usual way and H::T installed without fuss using cpanp.

    #! /usr/bin/perl use strict; use warnings; use HTML::Tidy; my $tidy = HTML::Tidy->new( { output_xhtml => 1, tidy_mark => 0, markup => 1, q{show-body-only} => 1, } ); printf qq{tidyp: %s\n}, $tidy->tidyp_version; printf qq{libtidyp: %s\n}, $tidy->libtidyp_version; printf qq{HTML::Tidy: %s\n}, $HTML::Tidy::VERSION; my $html = do {local $/;<DATA>}; $tidy->parse(q{test.html}, $html) or die q{parse failed}; for my $message ($tidy->messages){ print $message->as_string, qq{\n}; } my $xhtml = $tidy->clean($html); print $xhtml; __DATA__ <div> <p>tidy</p> <img src="pic.jpg"> </div>
    tidyp: 1.04 libtidyp: 1.04 HTML::Tidy: 1.54 test.html (1:1) Warning: missing <!DOCTYPE> declaration test.html (1:1) Warning: inserting implicit <body> test.html (1:1) Warning: inserting missing 'title' element test.html (3:3) Warning: <img> lacks "alt" attribute <div> <p>tidy</p> <img src="pic.jpg" /></div>
    See the tidy quick reference for all the configuration options.
      Thanks. It actually installs fine on Debian using the packaging system. And I was able to use and configure it. The issues are:
      1. The version in Debian is old.
      2. An update does not appear to be happening I think due to the fork of tidy. It makes it very messy and until someone really screams it won't happen. I am in the relevant group and I won't volunteer.
      3. I could not configure it to change "<span>blah</span>" to "blah". Saying that tidy is not intended to do that is reasonable, but I want it to do that. Javascript rich text editors generate stuff that one does not necessarily want or need.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://861793]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (11)
As of 2018-06-25 15:57 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (127 votes). Check out past polls.