|Perl: the Markov chain saw|
Okay. I need a place to vent on the topic of CGI and this seems like the perfect place.
I've been doing some research on HTML4, parsing HTML, and related topics. I've recently been trying to build a browser in Perl (and yes, I use the modules when I know about them).
As the very first parsing I did, I grabbed all the <h1> to <h6> tags. These tags, when used correctly, should give a good outline of what's on the page. But guess what? I checked a lot of major sites, like search engines, news sites, then some great discussion sites that are success stories for Perl, then a few random Monk home pages. Almost nobody uses these tags. I thought there was a problem with my program! Everybody is using the <font> tag instead of the classic header notations.
Then I got curious, so I headed to an HTML validator. I checked all the same pages again. I found two pages that were even close to "valid" compared to the standard. And one of those was w3's own home page. The other was missing a single alt tag on a gif.
So here's my sore spot. Using CGI.pm is obviously recommended. I can't think of a single reason not to use something that comes with every default install of Perl. That would be like writing foreach loops to perform an action on list elements instead of using map. But even those that I can't imagine are not using CGI.pm (slashdot or perlmonks, for instance) do not generate valid HTML. Not even close according to the error report.
While I've seen plenty of loud complaints when people roll their own form parsers, I am not seeing those same loud complaints when people (mis)use CGI to generate suboptional HTML or who use it to parse the forms, but then completely ignore it for generating HTML. It would appear that even those people who are using CGI.pm can't count on it to put in some default alt tag for their gifs when they forget-- it creates correctly formed HTML that may be gibberish according to the standards.
The module might as well be CGI::ParseForms and skip all the HTML building routines, for the ways it seems to be used in the wild. And frankly, given how much trouble I have with fonts that get too small, or pages that are completely unreadable in text-only mode (yes, I like to browse in Lynx sometimes just to get away from all the image rendering issues and time wasted waiting for them download over the modem), I'd like to see us make stronger, more frequent recommendations to use CGI for building HTML and then to remember that using it is no guarantee of perfect HTML either.
Some of the above is altered and Update: based on responses below, I'm not sure what I said that muddied my point. What I'm saying is simple. Feel free to keep harping away on the poor souls who roll their own parsing routines instead of using CGI.pm. But please, consider applying the same critical eye to people who use only 5% of the functionality of the module and continue to hand code HTML (often hard coding large chunks of it into their scripts), or who use the module to create crummy HTML by subverting the fact that while it writes well-formed HTML it does not validate tags, attributes, or block/inline nesting.