in reply to Why I like functional programming
Hello, I am the HTML::Parser nazi. I go around commenting on other people's attempts to parse HTML, and I always yell at them for trying to do something that's very hard themselves, and tell them to use HTML::Parser.
I intended to do that here.
I tried to break tilly's code.
So while I still don't necessarily understand why one wouldn't use HTML::Parser, as far as I can see this is clean. Excellent stuff. I'm very impressed.
|Replies are listed 'Best First'.|
RE (tilly) 2 (not html): Why I like functional programming
by tilly (Archbishop) on Oct 01, 2000 at 08:04 UTC
The problem is that the incoming document is not HTML. It is a document in some markup language, some of whose tags look like html, but which isn't really. I don't want to spend time worrying about "broken html" that I am going to just escape. I don't want to worry about valid html that I want to deny. I want to report custom errors. (Hey, why not instead of just denying pre-monks image tags, also give an error with a link to the FAQ?) And I want to include markup tags you won't find in HTML.
I did a literal escape above using [code] above. I submit that HTML::Parser would not help with that. OK, so that should be <code> for this site, but this site would want to implement a couple of escaped I didn't. For instance the following handler would be defined for this site for [ (assuming that $site_base was http://www.perlmonks.org/index.pl and hoping that I don't make any typos):
And, of course, given $node_id there is probably a function get_node_name available. And we have that lastnode_id the site keeps track of. So we also need a handler for [:// to link by ID, and that would be generated by something like this:
If this still looks to your eyes like a slightly hacked up html spec, let me show you a feature that I dearly wish that this site had. Stop and think about what the following handler for \ does:
Do you see it? Consider what would happen to the following string:
Got it yet?
No more looking up those pesky escape codes! :-)
My apologies for using you as a foil, but you just let me illustrate Tom's point perfectly. All of the stuff I am saying is obvious to anyone who has played with functional techniques, but since you haven't you are simply unable to see the amazing potential inherent in this method of code organization. And I happen to know that you are not a bad programmer, but this was a blind spot for you.
Time to put down the pot, we aren't boiling now. This is a frying pan and I feel like an omelette. :-)
by dchetlin (Friar) on Oct 01, 2000 at 09:43 UTC
by tilly (Archbishop) on Oct 01, 2000 at 16:28 UTC
While I appreciate the compliment, I do think that it was functional programming which made this work.
If I am a strong programmer, my strength was shown here in picking the right style for the job, not in executing that style in an amazing way. I agree that your average Perl programmer would botch this job, probably horribly. I also submit that your average competent Perl programmer would also botch it - possibly subtly and probably not. I know that I personally wouldn't know how to tackle this in an OO style, and could not come up with a solution I would like in a procedural style.
However I think that most decent Perl programmers with exposure to functional techniques, when given the core function would have little difficulty in adding a series of handlers and getting it right from there on in. And that core function is easy to get right because it is doing something conceptually simple. (Scanning for handled stuff, and escaping anything that doesn't wind up being handled.) The barrier here is conceptual, not technical.
As for whether HTML::Parser is a good fit, that depends on how you read Ovid's problem. It would be a good fit if you wanted to just strip out disallowed tags. It would be a bad fit if you wanted to escape them again, leaving text untouched and add error messages as I did. It would be a really bad fit if this parsing piece was going to be extended (as the above was) to allow a number of custom markup symbols to be used.
Personally I always get irritated at seeing my mistakes be silent. So while HTML::Parser might solve some spec, it would not give a solution that could be extended nicely. And I am not sure it would solve Ovid's problem to the satisfaction of future requests that might come in. But this does.
More amusing handlers. The right one for :// can autodetect urls. One for @ can do the same for email addresses. The main loop needs to be changed to a different class, but "\n" gives you the ability to have a newbie mode. (Note, look for leading spaces on the next line and turn them into please.)
So you see, a ton of different requests can be satisfied, and when the rules conflict (eg don't look for URLs in formatted code) there is already a good resolution.
Looking at this, I don't think I could do all of that (particularly the switching) using any other technique I know. Without having seen functional, I would be lost.