|Problems? Is your data what you think it is?|
An exploration of higher-level-language approaches to the "First all-zero row in an NxN matrix" problem (goto Considered Harmful)by davido (Archbishop)
|on Sep 24, 2011 at 19:28 UTC||Need Help??|
This is a long meditation, and much of it is wrapped in <readmore> tags so that it doesn't fill the Meditations section with too much text for those who are not interested in the topic. ....ok, here goes...
A few of you are probably aware that I enjoy CS history. The wisdom of some of the earlier innovators in the science of computing is often just as relevant today as it was 30 to 50 years ago. And often the insight encountered when reading old articles is surprising.
In 1968 the Dutch computer scientist Edsger Djikstra published a letter that was originally to be entitled "A Case Against the GO TO Statement", advocating the abolition of 'goto', and the strengthening of structured languages such that 'goto' becomes unnecessary. He posited that a language with a sufficiently expressive grammar wouldn't need goto, and that, in fact, the use of goto makes the disconnect between code-space and execution-time (or chronology) conceptually difficult for the mind to follow. The implication being that such code is harder to write, maintain, and even comprehend useful algorithms.
When the publication's editor, Niklaus Wirth (of Pascal and Modula-II fame) received the letter, prior to publishing it he changed its title to the imortally famous (within those circles) title, "Goto Statement Considered Harmful." Having read the article, as well as some of his other works, I would have to assess that the title, as bold as it is, doesn't rise to the level of disdain Dijkstra had for 'goto', while on the other hand, his original 1968 article seems to fall a little short in putting forward a set of arguments that could present an open and shut case.
But whatever the reason, a few years later (ok, 20 years later, in 1987) a writer-in to the publication "Communications of the ACM" (the same publication where Dijkstra's letter was originally published almost 20 years earlier) sent a letter entitled "'GOTO Considered Harmful' Considered Harmful" that made several assertions suggesting that the mindless abolition of all 'goto's was causing programmers to go to a lot of extra work, and costing millions of dollars in lost productivity. He used an example of a programming challenge that was made easier by use of Goto: Let X be an N x N matrix of 0's and 1's. Find the first row that contains all zeros. He published the following code (obviously not Perl) to make his point:
I apologize right now for any typos that I may have contributed to the above code; I copied it out of a PDF of a scan of the publication, and think I got it 'character for character' right. I like the "First all-zero row in an NxN matrix" problem because it's fairly simple to implement a number of ways, and because optimizing it to terminate as soon as a match is found almost always involves some sort of short circuit that could be construed by a tight-assed purist as using a disguised version of goto (more on that later).
Rubin went on to show in code that I won't post (but which you can read in the links I provided) that the non-goto versions were longer, required flags, and were more complicated to follow. He then put forth an example of those types of solutions, but reworked to eliminate the need for a flag; it took nine lines (in the same language, which appears to be Pascal).
In a series of followup letters, later that year in the same publication, under the title, "" 'GOTO Considered Harmful' Considered Harmful" Considered Harmful?", it was mentioned that Rubin's non-goto examples were made more complex by inadequacies in the language. Moore proposed a solution that required a non-existent feature in Pascal, but that simplified the problem further. Another followup point seemed to be that while 'goto' is useful sometimes, often its need is created by inadequacies in the language itself. It advocated forms of goto that had more limited semantics (eg., break, next, last, etc.). Several solutions were posted in Pascal and C.
Eventually Dijkstra responded with another letter, "On a Somewhat Disappointing Correspondence." In it, he swiftly mops up Rubin's examples by showing them to contain bugs, or successes by accident.
All this is a pretty good read if you find that sort of thing interesting (which I do). But the real purpose of this meditation is to give a couple Perl examples (and possibly solicit others in Perl or other high-level languages).
In the following code you will find four techniques, each wrapped in a subroutine. The first is recursive (just for the sake of being 'possible'). The second is a simple iterative approach (it most closely echos the non-goto versions of code presented in some of the articles I linked to). The thrid is the highest-level; it uses List::MoreUtils 'firstidx' and 'all' functions, and essentially accomplishes the task in a single line of code if you exclude the subroutine's "setup" and framework. This is probably the preferable approach from a standpoint of clarity.
Then the last approach is a "higher order" functional iterator technique, which manufactures an iterator function, and tests the iterator against a test function which was passed in to the gizmo by function reference along with a test param list. The goal here was a generic functional approach that could easily be made specific to the problem. It's obviously much more complex than the List::MoreUtils approach, and for no particularly good reason other than to demonstrate another technique.
All of the approaches are designed to be an optimized O(n), where by "optimized" I mean that each row test fails as early as possible, and as soon as the first matching row is found, the tests exit immediately. The recursive approach is going to suffer in terms of performance by the call-frame overhead. The functional iterator approach also will suffer from a performance standpoint since each iteration requires a function call to the iterator. But it is a solution that uses tools that are general enough to be applicable to other sorts of tests.
That leaves the simple iterative approach and the List::MoreUtils approach as the "realistic" techniques. Both are an optimized O(n) without extra call-stack overhead found in the other versions. My personal opinion is that the List::MoreUtils version is the highest-level, easiest to understand, and completely devoid of gotos or disguised "limited semantic gotos" (hidden internals could be written in whatever ugly method the author of those functions wants though).
While on the subject of "limited semantic gotos", I suppose that a purist would consider the use of 'last', 'next', multiple 'return' statements, and in other languages, 'break' to be a form of goto; particularly if a label used. If we look at that as "a bad thing", I don't know how we'll ever get past the use of some form of 'goto' in practical programming. On the other hand, if we look at these as tools that take the controversial 'goto' and turn it into less "harmful" short-circuits, it would seem that Perl (as well as many other modern languages) has come a long way since the 60's.
It would be interesting to me to see what other constructs or approaches people might employ in solving the "first all-zero row in an NxN matrix" problem. For one thing, there is probably a better recursive approach. And I'm sure someone could come up with a simpler "functional iterator" approach as well, that retains generic utility. Here is the code I came up with to present a few alternatives.
One regret I have with several of these examples is that by wrapping them in functions I had to change "last" to "return" several times. The result is the same, but I liked the way they worked out with "last" as inline code better than with "return" as subroutines. ...Not really important, but now that I've mentioned it you should be able to spot pretty quickly how the simple iterative approach, in particular, would be slightly different if written inline instead as of a subroutine.
Again, I'd love to see a few other creative techniques.
Point of clarification: I suggest that these are O(n) solutions, yet they all contain nested loops. How could this be? Let 'n' be the total size of the data set, so for NxN where "N" is 5, n (the size of the total data set) = 25. Nobody mentioned it, but I thought that point required refinement.