http://www.perlmonks.org?node_id=51117


in reply to use CGI or die;

Okay. I need a place to vent on the topic of CGI and this seems like the perfect place.

I've been doing some research on HTML4, parsing HTML, and related topics. I've recently been trying to build a browser in Perl (and yes, I use the modules when I know about them).

As the very first parsing I did, I grabbed all the <h1> to <h6> tags. These tags, when used correctly, should give a good outline of what's on the page. But guess what? I checked a lot of major sites, like search engines, news sites, then some great discussion sites that are success stories for Perl, then a few random Monk home pages. Almost nobody uses these tags. I thought there was a problem with my program! Everybody is using the <font> tag instead of the classic header notations.

Then I got curious, so I headed to an HTML validator. I checked all the same pages again. I found two pages that were even close to "valid" compared to the standard. And one of those was w3's own home page. The other was missing a single alt tag on a gif.

So here's my sore spot. Using CGI.pm is obviously recommended. I can't think of a single reason not to use something that comes with every default install of Perl. That would be like writing foreach loops to perform an action on list elements instead of using map. But even those that I can't imagine are not using CGI.pm (slashdot or perlmonks, for instance) do not generate valid HTML. Not even close according to the error report.

While I've seen plenty of loud complaints when people roll their own form parsers, I am not seeing those same loud complaints when people (mis)use CGI to generate suboptional HTML or who use it to parse the forms, but then completely ignore it for generating HTML. It would appear that even those people who are using CGI.pm can't count on it to put in some default alt tag for their gifs when they forget-- it creates correctly formed HTML that may be gibberish according to the standards.

The module might as well be CGI::ParseForms and skip all the HTML building routines, for the ways it seems to be used in the wild. And frankly, given how much trouble I have with fonts that get too small, or pages that are completely unreadable in text-only mode (yes, I like to browse in Lynx sometimes just to get away from all the image rendering issues and time wasted waiting for them download over the modem), I'd like to see us make stronger, more frequent recommendations to use CGI for building HTML and then to remember that using it is no guarantee of perfect HTML either.

Some of the above is altered and Update: based on responses below, I'm not sure what I said that muddied my point. What I'm saying is simple. Feel free to keep harping away on the poor souls who roll their own parsing routines instead of using CGI.pm. But please, consider applying the same critical eye to people who use only 5% of the functionality of the module and continue to hand code HTML (often hard coding large chunks of it into their scripts), or who use the module to create crummy HTML by subverting the fact that while it writes well-formed HTML it does not validate tags, attributes, or block/inline nesting.

Replies are listed 'Best First'.
Re: Re: use CGI or die;
by davorg (Chancellor) on Jan 11, 2001 at 18:30 UTC

    The problem is that many years ago it was decided by the powers that be that browsers would be lenient towards bad HTML. This is generally seen as a Bad Thing. As you've seen, the vast majority of the web is now made up of invalid HTML.

    Using the HTML shortcuts in CGI.pm helps in one way as a construction like:

    ul(li([1 .. 10]));

    will at least be well-formed, unfortunately it doesn't prevent you doing something like:

    p(font({size=>'larger', color=>'red'}, 'Heading'));

    instead of

    h1('heading');

    and using CSS to handle the appearance.

    I haven't looked at a new version of CGI.pm for some time, but I'm hoping that it either has or will soon have an XHTML mode, but that still won't stop people from Doing The Wrong Thing :( You can't get away from the fact that it's the web page author's responsibility to create valid HTML.

    The only option is for browsers to suddenly stop working on invalid X?HTML, but the chances of that happening are appoximately zero.

    Dave...
    (who tries to validate all of his web pages, but admits that a few errors do creep in)

    --
    <http://www.dave.org.uk>

    "Perl makes the fun jobs fun
    and the boring jobs bearable" - me

      Headings aren't in there to give you a convenient containter for random style markup. Headings are in the the HTML standard to allow you to break a document up into sections in an orderly fashion. People just abuse them to get nice giant bold text. If you want Nice Giant Bold Text and the text isn't a series of headers, please do use font or css on a <span> tag. H1-H6 tags should be used for headings so that they can be pulled into an outline of a document. Otherwise you just have found another way to abuse HTML.

      --
      $you = new YOU;
      honk() if $you->love(perl)

        While I certainly agree in principle, there's a small problem with that point, in practice (sometimes): Many browsers do not properly parse the span tag and, worse, do an abominable job of applying CSS to the same tag. There are times when the only options are using the heading tags or using the font tags to get the desired effects in a broad enough cross-section of graphical browsers to make a webpage properly functional to a large enough segment of the browsing population.

        Since using font tags has a tendency to generate error warnings in CSS validators, that means that sometimes heading tags are the only option for getting by the validators. That, of course, is because validation scripts are often somewhat limited, and not because the code is actually valid, but it emphasizes the difficulty we'll have in moving toward a standards-compliant web while standards-noncompliant browsers like IE still command such a large share of web browsing business.

        I agree: heading tags shouldn't be used that way. That doesn't mean I don't understand why they often are. I'm more annoyed by the failure of browsers to properly parse span tags than I am by the failure of web designers to properly use HTML (or XHTML, as the case may be) and CSS when they try to compensate for the failures of web browsers.

        I eagerly await the day that the tools with the best functionality all properly support XHTML, in any case. I'd like to see CGI.pm and any other tool used for generating markup actually produce code that is difficult to structure badly by virtue of its parsing rules. Regardless of what it does support, however, until the viewing apparatus supports the full standard there will always be standards-noncompliant kludges used to get around the failings in the system as a whole.

        - apotheon
        CopyWrite Chad Perrin
      > The only option is for browsers to suddenly stop
      > working on invalid X?HTML, but the chances of that
      > happening are appoximately zero.

      Suddenly? No. Most of the web is still a non-wellformed mixture of HTML3, HTML4, and imaginary tags made up by specific browsers. However, current browsers do choke on non-wellformed markup if it is served with a content-type of text/xml, and that's a first step. As things like XSLT and RDF start to catch on, sites that want to harness the value of those things will have to be redone in wellformed XML, and that's that. (They won't necessarily have to provide and validate against Schemata, but we have to start someplace.)

      Incidentally, if CGI.pm is now improved to the point of being capable of producing anything that remotely resembles XHTML, maybe I should have another look at it; I've been avoiding it because of two things, and one was the execrable state of its output. If that has been shaped up, maybe the other thing (the tendency to obfuscate the Perl code) has been improved too, since I looked at it (which has been a bit), and I should have a second look.

       --jonadab

        Using CGI to handle the logistics of server interaction and using it to produce your output are two very different things. I use it nearly exclusively for the former but almost never for the latter.

        Makeshifts last the longest.

Re: Re: use CGI or die;
by merlyn (Sage) on Jan 11, 2001 at 20:40 UTC
    The module might as well be CGI::ParseForms and skip all the HTML building routines, for the ways it seems to be used in the wild. And frankly, given how much trouble I have with fonts that get too small, or pages that are completely unreadable in text-only mode (yes, I like to browse in Lynx sometimes just to get away from all the image rendering issues and time wasted waiting for them download over the modem), I'd like to see us make stronger, more frequent recommendations to use CGI for building HTML and then to remember that using it is no guarantee of perfect HTML either.
    There's one very nice thing about CGI.pm that hasn't been yet pointed out: if you had been generating valid HTML all along using the shortcuts (as opposed to "print"-ing your own), then in the most recent releases of CGI.pm, you are now generating valid XHTML! Yes yes yes! Thank you Lincoln!

    -- Randal L. Schwartz, Perl hacker

Re: Re: use CGI or die;
by why (Acolyte) on Jan 11, 2001 at 18:26 UTC
    Try this:
    use CGI qw( glark yurp ); my $q= CGI->new(); print $q->h1( "This is not really HTML" ); print glark( { flinge=>"worz", plutch=>"erff" } ); print yurp( { huid=>"queez", urst=>"hmmph" } ); print $q->font( { crypet=>"swoom", whalk=>"47" } );
    which produces
    <H1>This is not really HTML</H1> <GLARK FLINGE="worz" PLUTCH="erff"> <YURP URST="hmmph" HUID="queez"> <FONT WHALK="47" CRYPET="swoom">

      See! Stein is embracing and extending HTML! He's eeevil! ;-)

      (Score -1, Off-topic):
      Does anyone already have the address I_learned_to_read@hotmail.com ? It might be funny to have it.

      Or:

      They_still_use_BSD@hotmail.com :-)
Re: Re: use CGI or die;
by extremely (Priest) on Jan 11, 2001 at 20:28 UTC
    I hate hammers and screwdrivers. Well, not exactly hate them because they come with every toolbox it seems but I want to complain about how people use them. People are always using these tools to build things that are dangerous. The planks on the deck are loose, the shelves in the bookcase are wobbly, people try and open cans with them, etc. We should change their names to "nail-driver" and "threaded-metal-cylinder-turner" until we can fix these tools to alert the user when they are using them incorrectly or at least get the hammer to countersink, putty, and sand.

    No offense ichimunki, I'm in your camp on this, I just think that you shot at the wrong criminal. People write shitty HTML with any tool, CGI can't make things worse and frequently makes things better.

    That is a ++ to ichimunki in case I was equally unclear =)

    --
    $you = new YOU;
    honk() if $you->love(perl)