HTML::Parser would actually be an awful fit for this
problem. If you don't believe it, try to duplicate the
functionality the code already has.
The problem is that the incoming document is not HTML.
It is a document in some markup language, some of whose
tags look like html, but which isn't really. I don't want
to spend time worrying about "broken html" that I am going
to just escape. I don't want to worry about valid html
that I want to deny. I want to report custom errors. (Hey,
why not instead of just denying pre-monks image tags, also
give an error with a link to the FAQ?) And I want to include
markup tags you won't find in HTML.
I did a literal escape above using [code]
above. I submit that HTML::Parser would not help with
that. OK, so that should be <code> for this
site, but this site would want to implement a couple of
escaped I didn't. For instance the following handler would
be defined for this site for [ (assuming that
$site_base was
http://www.perlmonks.org/index.pl and hoping that
I don't make any typos):
use URI::Escape qw(uri_escape);
sub {
my $t_ref = shift;
if ($$t_ref =~ /\G([^\|\]]+)(?:\|(\|[^\|\]]+))?\]/g) {
my $node_ref = "$site_base?node=" . uri_escape($1);
my $node_name = encode_entities($2 || $1);
return qq(<a href="$node_ref">$node_name</a>);
}
else {
return show_err("Incomplete node link?");
}
}
And, of course, given $node_id there is
probably a function get_node_name available.
And we have that lastnode_id the site keeps
track of. So we also need a handler for [://
to link by ID, and that would be generated by something
like this:
sub ret_link_by_id {
my $tracking = shift; # eg "&lastnode_id=23453"
sub {
my $t_ref = shift;
if ($$t_ref =~ /\G([1-9]\d*)(?:\|([^\|\]]+))?\]/g) {
my $node_id = $1;
my $name = $2 || get_node_name($node_id);
my $node_name = encode_entities($name);
my $url = "$site_base?node_id=$node_id$tracking";
return qq(<a href="$url">$node_name</a>);
}
else {
return show_err("Incomplete node_id link?");
}
}
}
If this still looks to your eyes like a slightly hacked up
html spec, let me show you a feature that I dearly wish
that this site had. Stop and think about what the following
handler for \ does:
sub {
my $t_ref = shift;
if ($$t_ref =~ /\G([&\[\]<>\\])/g) {
return encode_entities($1);
}
}
Do you see it? Consider what would happen to the following
string:
You can link by URL like this:
<pre>
\<a href="http://www.perlmonks.org/"\><a href=http://www.perlmonks.o
+rg/>Perl Monks</a>\</a\>
</pre>
Got it yet?
No more looking up those pesky escape codes! :-)
My apologies for using you as a foil, but you just let me
illustrate Tom's point perfectly. All of the stuff I am
saying is obvious to anyone who has played with functional
techniques, but since you haven't you are simply unable to
see the amazing potential inherent in this method of code
organization. And I happen to know that you are
not a bad programmer, but this was a blind spot for you.
Time to put down the pot, we aren't boiling now. This is a
frying pan and I feel like an omelette. :-) |