Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

In the mid-eighties, I was briefly involved in the development of an IBM internal language called PLS86. A descendant of PL-1 language that IBM started developing in the late 50s and early 60s, it was targeted at the x86 platform and one of its major design goals was to allow the production of "mathematically provably correct programs". The language was briefly considered for use in the development of IBM's then top-secret, and as-yet unnamed, OS/2. (SO inventive. Not!). Anyway, the language was dropped (for use in developing OS/2) not least because IBMs then partners in this development, MS, refused to use it and so C was used instead.

I was friends with one of the guys who did considerable work on the development of the "mathematical proof" bit of the language. I don't recall (probably because I didn't really understand much of it) all of the details, but there was a point in the development of the language, which went ahead anyway, where the provability requirement was quietly dropped.

The reason came down to the simple statement: It wasn't possible!

Some of the bits I remember that lie behind the conclusion are:

Even with it being a totally pre-compiled language, without the dynamic features of some other languages (eg. eval in perl, INTERPRET in REXX etc.), unless the program is entirely statically linked, there is no way to ensure that you will not get failures due to the unavailability or incorrect runtime linking of dynamically bound code segments.

In order to prove the program is correct, it becomes necessary to add additional code to verify the inputs to every part of an algorithm and more code to verify the output. However, there are many problems with this.

  • The first set are that the only effective ways of doing this for any possible set of inputs and outcomes are
    • to duplicate the entire algorithm using an alternative set of instructions.

      This is the method used by some life-critical projects. For example, the software used for airliner fly-by-wire systems is developed from the same specs by three different software teams in complete isolation (clean-room environments) and targeted at 3 different (types of) processors. There is then a fourth processor that supplies the same input to each of the three and compares their outputs.

      If one processor reaches a different conclusion to the other two, it's result is ignored. If it consistently produces a different result, then that processor is flagged as being broken. The idea is that by using three different types of processor, no two are likely to be affected in the same way by a hardware, micro-code or software bug (think Pentium floating point bug!).

      By having the teams that develop each set of software work in isolation, it is less likely that they will make the same (wrong) assumptions. So long as two of the three processors arrive at the same result at any given step of the process, it's quite likely that they are correct.

      Whilst this method works for the most part, it is hugely costly and so not cost effective for most types of software.

    • to have a lookup table that maps inputs to outputs.

      Even allowing for the fact that the storage required to maintain these tables would be huge for anything other than the most trivial of projects, there is still the possibility that the inputs can be corrupted before they reach the process. Memory (RAM or DISK) failure, sensor failure, human input error etc.

      Beyond even that, the resultants in the lookup tables would have to be generated. For anything beyond the trivial this would need to be done using software.

      Catch-22, how do you ensure that that program is correct?

  • The second set of problems are that the additional code added to verify algorithmic correctness are
    • themselves software--and so subject to bugs, and failures.

      Catch-22 again,

    • adds more code to the project thereby increasing complexity--Complexity and volume are the prime multipliers for bug rates.

      all the additional processing costs hugely in terms of both development and execution time.

      People have, and still do, attempt to address this by making the test/ verification code conditional. That is to say, they add the tests and verification code in such a way that it can be 'turned off' in the production environment. The problem with that is that removing the code has a measurable, but sometimes unpredictable effect on the overall algorithm. This is effectively the same as the Quantum Effects principle in Physics--you cannot measure something without effecting it.

That isn't the complete story, but it's all I can remember for now.

What this means is, that for any real-world, costed, project, it becomes necessary to define a level of reliability that is 'acceptable' and then design your development and testing criteria to achieve that.

Testing is good, but it is not a silver bullet to bugs. It is impossible to achieve 100% coverage of tests. The first major project I was involved in testing (the GUI component of OS/2), had over 700 APIs, with an average of 4 parameters per API and some with as many as 13! Many of those parameters are themselves (pointers to) structures which can consist of as many as 40+ discrete fields, and each field can (for example) be 32 individual booleans; or integer values with ranges of +- 2**31 or 0-4**32; or floats ranging from something like 1e-308 to 1e+308.

Do the math! Even without considering the effects of interdependencies between the APIs--you can't draw a line until you've obtained a presentation space, you can't obtain a presentation space until you've obtained a device context and so on--the numbers become astronomical. Considering testing everything simple isn't possible given the projected life span of the earth, never mind that of human developers:)

Given you have an experienced programmer developing yet-another-version of some project that s/he has done several of before, then they will probably be able to make some educated guesses about where the edge cases are and there by reduce the sets of tests to a manageable size. However, it was proved fairly comprehensively (to my satisfaction anyway), by some work done by another bit of IBM around the same time, that even experienced developers make bad guesses as to where the edge cases are when they move to projects that are even slightly different to those they have done before. In fact, that particular team showed that they make worse than random choices! And the more experienced they were (in similar but different project types), the worse their guesses were.

The upshot was that we ended up developing a random test case generator. The logic went like this. If you have a few computers generating random but valid sequences of API calls, along with other code to test the resultant program, then with you can use statistics--the number of programs generated -v- the number of them that failed--to determine the rate at which bugs are being found. By re-running every test cases generated, both good and bad, each time a new build was released, you get an effective regression test. You also get a statistical measure of the rate at which earlier bugs are being fixed, which of them are re-appearing etc. You can also break the statistics down by component, developer, day of the week etc. etc. This allows you to target your efforts to where they are of greatest benefit.

The effect of this was amazing and salutary. There had been many, many test cases written prior to the RTG being available. WIthin a month it was shown that all of the test case produced by programmers/ testers targeting their efforts according to their (or their superiors) best judgment, had covered less than 15% of the total APIs with 10% having been duplicated over and over, 5% of the possible parameter combinations, and found less than 1% of the possible bugs.

Don't ask me how they arrived at this last statistic, it was way to much voodoo for me, but I can tell you that within two months, they were beginning to predict the number of new bugs that would be found, old bugs that would be cured and old bugs that would re-surface in any given week (with twice daily builds) with amazing accuracy. It was right around this time that the project got moved from the UK to the US and I never heard much more about it.

You might find this article interesting Coming: Failsafe software. The only way the software industry is going to move out of the metaphorical iron- or maybe bronze-age, is when we start using computers to assist us in our work.

It's a strange thing. If you describe a problem in almost any field of endeavour to a programmer, he will nearly always suggest a way that he could improve or solve that problem using a computer--except when the field of endeavour is his own!


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.


In reply to Re: Software Design Resources by BrowserUk
in thread Software Design Resources by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2021-06-20 00:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (93 votes). Check out past polls.

    Notices?