Perl-Sensitive Sunglasses | |
PerlMonks |
Cleanning HTML - New/better module for that - test please! ;-Pby gmpassos (Priest) |
on Apr 22, 2003 at 06:00 UTC ( [id://252200]=perlmeditation: print w/replies, xml ) | Need Help?? |
I was testing the module HTML::Clean to make a filter flag to the output of mod_perl for HPL (another HTML/Perl embed). But when I started to
see the source, how the code is cleaned, I saw that the filter can make some
mistakes with complex HTML. So I decided to make my own filter, but one that
doesn't change the final result in the browser. I made some tests with
HTML::Clean and my new module, and saw that I got a better filter (without changes in
the result) and that clean better/more. (I have used www.cnn.com.br & www.perl.com pages that
have styles, javascript, etc...)
What I want is not say what is better or not, actually the HTML::Clean idea to make a filter based in direct changes with RE is good, since use less memory, but it can't know exactly what it does inside the HTML tree. But we can't make a filter full based in parsed HTML tree, since this will be slow, what is not good for a server. My module is something between the 2 ways, and try to look in the basic things that can be cleaned, not very complex ideas, to keep it fast. I was talking with the author (for now just sent an e-mail, waiting reply) to make some update to the module HTML::Clean with the code that I made. But the code has only 2 days of life, and need tests. I would like that the monks test the code with some Web Sites and see if the output was ok, the same, in the browser. Any idea to make the filter better or comments are gladly accepted!
To test get: http://www.inf.ufsc.br/~gmpassos/htmlclean.zip
Graciliano M. P.
Back to
Meditations
|
|