http://www.perlmonks.org?node_id=422845

Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I have two questions -- Question One: how am I supposed to Tidy my HTML with HTML::Tidy?

That might sound like a stupid question, but according to the documentation the module only has six methods, none of which returns my HTML to me in any form -- am I missing something?

Question Two: where's HTML::Tidy::Document, which would seem to hold the answer to Question One? I did a bit of searching, and found a reference to this module in PerlMonks here (at thepen). But I can't find that on CPAN. Is it something to do with SWIG?

I'm genuninely mystified. The tidy library will happily return my HTML to me, tidied, but the Perl wrapper won't?

Update: Now with extra added samples of code!



($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
=~y~b-v~a-z~s; print
  • Comment on HTML::Tidy and mysterious HTML::Tidy::Document

Replies are listed 'Best First'.
Re: HTML::Tidy and mysterious HTML::Tidy::Document
by BUU (Prior) on Jan 17, 2005 at 23:20 UTC
    Just from reading the documentation, it appears that the clean() method does an inplace edit of your string. It this not true?

      Hmm, I guess that's one interpretation. It doesn't seem to be doing it though.

      I should give some code, shouldn't I? OK, I have a file, which when I do tidy test.html on the command line, gives three warnings and makes 3 changes.

      line 5 column 1 - Warning: <style> inserting "type" attribute line 11 column 1 - Warning: trimming empty <p> line 11 column 4 - Warning: trimming empty <p>

      But this script does nothing, generating no warnings and reproducing "test.html" exactly the same as before.

      #!/usr/bin/perl use strict; use warnings; use diagnostics; use HTML::Tidy; undef $/; open(M,"test.html") || die "$!"; my $html = <M>; my $tidy = new HTML::Tidy; $tidy->clean( "this file", $html ); for my $message ( $tidy->messages ) { print $message->as_string . "\n"; } print $html;


      ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
      =~y~b-v~a-z~s; print

        Someone needs to ping petdance (with a patch?). The documentation lies. Here's working code:

        #!/usr/bin/perl use strict; use warnings; use HTML::Tidy; my $fname = join ' ', @ARGV; my $html = do { local $/; <> }; # slurp file(s) from commandline my $tidy = HTML::Tidy->new(); $tidy->parse( $fname, $html ); warn $_->as_string, "\n" for $tidy->messages; print $tidy->clean( $html );

        Makeshifts last the longest.

        Thanks for posting the code.

        If you want to use the messages method, you need to parse it first, not clean it.

        #!/usr/bin/perl use lib '/home/brian/lib/lib/perl/5.8.4'; use strict; use warnings; use HTML::Tidy; open M, "test.html" or die "$!"; my $html = do { local $/; <M> }; my $tidy = new HTML::Tidy; $tidy->parse( "test", $html ); for my $message ( $tidy->messages ) { print $message->as_string, $/; } __END__ output on a test file: test (1:1) Warning: missing <!DOCTYPE> declaration test (8:9) Warning: missing </form> before <option> test (6:1) Warning: <option> isn't allowed in <body> elements test (6:1) Warning: <input> isn't allowed in <body> elements test (12:33) Warning: inserting implicit <form> test (14:17) Warning: discarding unexpected </option> test (12:33) Warning: <form> lacks "action" attribute

        If you want the cleaned output, it is edited in-place, ie: $tidy->clean( $html ); # $html now contains tidied output

        Update: the clean method returns the clean html, as Aristotle points out below

        possible partial answer (but in current brain-dead condition cannot find the ref but believe I read this re invocation from the command line): are you sure Tidy is not writing the (allegedly) corrected file with an alternate or additional extension...
        eg "test.html.tidy" or "test.tidy"

        then again, this may be a mere brain-fart or confusion of a document dealing with the executable rather'n the module.