|Syntactic Confectionery Delight|
Re: Larry vs. Joel vs. Ovidby mr_mischief (Monsignor)
|on Nov 22, 2001 at 01:17 UTC||Need Help??|
When a program is liberal in what it accepts, that keeps it from breaking when someone else's program is liberal in what it emits.
When a program is strict in what it emits, it keeps another program from breaking by being strict in what it accepts.
A good example is any of the text-based protocols built on top of TCP, such as Telnet, SMTP, or HTTP. The standard calls for a line ending of CR/LF. Some servers or clients break if they don't get this. Others will break them by issuing just an LF or just a CR. If, however, a program accepts either CR/LF or a bare LF and always emits the proper CR/LF, then it will communicate with all properly implemented software and much of the quick and dirty hacks that may have broken another piece of software on the other end. This is similar to what Larry is talking about when he says programs should communicate this way -- with other programs.
Communication within a program is another story. You might prefer strict contracts, because you control the whole structure and you can ensure that all parts adhere to that internally. This is different from interacting with unknown software on another system that might have been designed by a different team in a different country in a different decade.
How successful would Microsoft Excel be in the spreadsheet market if it could not import spreadsheets from what was once the de facto standard, Lotus 1-2-3? How useful would the Unix shell command grep be if it limited you to searching 80-character lines?
How about a cut command that only works with its default value of tab-delimited fields instead of also allowing space or comma delimited fields or fixed-position character counts? Sure, cut is strict in its interpretation of an input file according to its switches, but its switches allow it to accept more than one type of input file. Otherwise, cut would be pretty useless unless you had a different program for each type of input. How about if it could only support disk files and not STDIN redirects to it? Yep, being liberal in what it accepts helps there too. Perl's regex engine, split(), and other tools form a way to accept many, many different types of input from a file in the same program easily. This is a good thing.
There are editors which only allow 80 character lines and only allow 64 kilobytes of text in a file. They are generally considered useless. This is another place that liberal acceptance of input is a good idea.
There's a rule for applications programmers (which more and more people are accepting as rules for systems programmers, too) that says you always want to offer the user (whether that user be a persoon or an automated process) zero, one, or an infinite number of something. This is a form of being liberal in what you accept. If there's one of something because it's a special case (one driver bound to a port, for example, or one shared memory segment per process if you wanted to do something like that in your OS), then that's just the one. If there's more than one, then it's not a special case, and there should be a way to offer an infinite number of the same to the user. This works out well with the unlimited number of variables in modern languages, the unlimited size of a file an editor can handle, etc. Of course, there are still some exceptions to this rule, such as practical limitations due to the size of an integer and the fact that using arbitrary precision math in something like pointer arithmetic or file position hurts performance. Still, it's good to accept a user's wish to not limit a user to some arbitrary artificial limit on input, objects, processes, or whatever unless it's for performance or security reasons.
Many of the GNU command-line tools accept BSD-style or SVr4 style arguments. This is a good thing. Some of them provide output in either format, but only when given an explicit argument to do so. They either emit their own format, or they emit the format requested. They are usually pretty strict once the choise is made. This is a good thing.
I'm in the middle of switching an ISP from sendmail to Postfix. Postfix can use many of Sendmail's files and file formats, but it can also use Qmail's. This is a great thing for me. I'll be moving a few different POP servers with names overlapping among the differing boxes onto one killer box, which QPopper can't handle. It's a good thing Teapop can handle the traditional Unix mail spool files even though Qmail's one-file-per-message system is now more accepted. IT may keep me from having to try to convert several thousand users' email into another format. I'll also be able to use system the password file, MySQL, PostgreSQL, htaccess files, db files, or flat files for user lists for the mail system. That's a good thing. I can't put overlapping names in the same system password file. I'll probably use htaccess files, one per domain. The flexibility of what Postfix and Teapop accept as input from the server side make them great tools for this project.
I'm sure I could give examples of liberal _acceptance_ of input being a good thing all day long. It seems to me that your issue lies more in the realm of liberal _interpretation_ of that input. Even in HTML, there's a difference between a browser ignoring a tag that's not understood and trying to render HTML that's just plain wrong. One allows for the expansion of the standard, and one is nonstandard. The former, though, is being liberal in what it accepts as input, as opposed to throwing an error saying something like 'invalid tag at line xxx'. Even allowing well formed, valid HTML 3.0 and well formed, valid HTML 4.01strict to be read and rendered by the same browser is being liberal in what the browser _accepts_. It still could complain if either page is badly formed. Being backwards-compatible is even a big part of some standards. C99 tries to break as little of C89 as possible. C++ standars attempt to allow most C to compile in a C++ compiler. Some ANSI C compilers have a K&R mode, a strict ANSI mode, and an ANSI with extra library functions allowed mode so a programmer can use whatever feels most comfortable. That's not to say that syntactically broken K&R code should work in K&R mode, or that syntactically broken ANSI code should work in the ANSI or ANSI+more modes.
I think the main issue here is one of a broad interpretation of 'accept'. I don't try to speak for Larry, but it's my understanding that he meant something less ambiguous than he is being taken to mean. After all, he's talking about being strict in what he emits. I think he's intending a conservative connotation for 'accept'. That's why I distinguish in this node between 'accept' and 'interpret'. All in all, I think you'll find that several of the other nodes in this thread are authored by those who read that quote from Larry the same way I do.