Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^3: Dynamically cleaning up HTML fragments

by clinton (Priest)
on Sep 25, 2010 at 21:08 UTC ( #862008=note: print w/replies, xml ) Need Help??


in reply to Re^2: Dynamically cleaning up HTML fragments
in thread Dynamically cleaning up HTML fragments

Glad it is working for you.

I really do not recommend writing your own HTML::Parser subclass. If you look at the source of HTML::StripScripts you will see that there is a lot going on there, and with good reason. If you write your own subclass, and you're not willing to spend the time checking every last detail, then you are likely to miss a whole lot of corner cases that HSS already deals with. Parsing HTML is a hard job, and even harder when you're trying to make sense of bad HTML.

(Again, I write as the fortunate maintainer, and not as the original author who did all the painstaking work.)

clint

  • Comment on Re^3: Dynamically cleaning up HTML fragments

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://862008]
help
Chatterbox?
[GotToBTru]: yeah, you mentioned it could not handle signatures
[Corion]: And so far I like signatures and nobody has yet come up to me and screamed at me for implementing a source filter to handle them even on early Perl versions ;)
[Corion]: I should re-prod tsee about the Filter::Simple branches that he might want to merge or that I should merge so he can do a release ;-D

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (14)
As of 2017-02-27 14:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?






    Results (387 votes). Check out past polls.