in reply to
Re: Comparative satisfiability of regexps.
in thread Comparative satisfiability of regexps.
The reason I was guessing at is that they have a lot of data to validate based on a complex regex and were considering using another faster regex as a precheck to save on processing.
If this was the case then it's to prove that the precheck won't reject anything that the main check would allow.
Re^3: Comparative satisfiability of regexps. by BrowserUk (Pope) on Jan 20, 2005 at 09:50 UTC 
That's a pretty good application, though I have to say that I would probably opt for a statistical approach to verification.
Generate a bunch of random strings, pass them through the stricter test, and then pass it's output through the quicker. If the second results set equals the first, you're good to go.
With a good knowledge of what the regex are attempting to match, it should be relatively easy to generate random datasets that exercise most of the edgecases, or at least a statistically computable subset thereof.
And that was my point. Using the knowledge of what the regex are trying to match, and statistics, is probably much easier than try to crosscompare to regex, for which the comparison is only valid if you can first prove that at least one of the regex is itself "correct"which is an extremely hard task to start with.
Unless you can:
 use a definite dataset to prove one of them.
In which case you could use the same dataset to prove the other.
 You accept a statistical proof of one.
Which again, could be used to prove the other.
Proving the compatibility of two unprovable entities doesn't really buy you a lot unless the comparability test is very quick and cheapwhich doesn't appear to be the case here.
There is no point in measuring something, unless you are pretty sure that your ruler is accurate!
Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.
 [reply] 

All true, and wellsaid, except that the precise commodity desired in this case is proof of the compatibility of two unprovable entities. My code neither knows nor cares what the two regexps "mean", or whether they correctly represent the author's intent; it only cares that output which satisfies the first regexp can be affirmatively stated to satisfy the second regexp.
In our particular instance, we're attempting to verify the correctness of a metaapplication built of sequentially assembled functoids whose input and output contracts are specified as .xsd files. We don't want to know what any given functoid does, or what its parameters (or outputs) represent. We just want to verify that a given sequence of correct functoids is itself correct.
As a separate note, I'm really disinclined to accept Monte Carlo proofs in the hard sciences. I'll accept them in the soft sciences, but only because there aren't better alternatives. In programming in particular, there's almost always a formal proof of correctness available.
Thanks much,
Mickey.
 [reply] 

In programming in particular, there's almost always a formal proof of correctness available.
I have to counter that statement.
Formal proof of a computer program's correctness is not possible. and anyone suggesting to you that it is, is not a scientist, but a snakeoil salesman.
Formal proof of an algorithm is possible.
And you can certify a particular implementation on a particular hardware setupwith the use of comprehensive and rigorous testingprovided the domain and range of every variable can be determined and catagorised. But it had best be a very important, and probably very simple program.
Any other kind of "proof" is either statistical or snakeoil.
Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.
 [reply] 


As a separate note, I'm really disinclined to accept Monte Carlo proofs in the hard sciences.
I guess that means you don't trust public key encryption then, because RSA needs prime numbers to work with, but everybody just uses a variation of the Fermat test to check whether the keys generated are probably prime. But most people aren't overly concerned by this because you can set your threshold for "probably" to be, say, 100 times lower than the probablity that cosmic radiation will flip a bit in your memory causing an otherwise perfect algorthim to fail.
 All code is 100% tested and functional unless otherwise noted.
 [reply] 



