in reply to Benchmarks aren't everything

I agree with what you have said within its own context. It is more important to make the stuff correct than to make it faster. No problem here.

But your write up cannot be used against Tim Bray's, as you did nothing to compare Perl's regexp against Java's as the original author did. What you did was to compare Perl's with Perl's, the Perl one that was faster but not quite right with another Perl one that is slower but right.

What if the Java one is not only faster but also correct? This is the key question you need to answer. If both are right, faster is obviously better.

Replies are listed 'Best First'.
Re^2: Benchmarks aren't everything
by tilly (Archbishop) on Oct 24, 2005 at 18:29 UTC
    Given the history of regular expression implementations, it would be very surprising if Java actually did handle this correctly. Until Perl implemented a solution, the best approach that I'd heard of anyone using for this problem was to first do a DFA match to see if it matched at all, then an NFA match to figure out what the match was.

    I don't have a Java compiler handy. But, for instance, this was not handled in Perl 5.005_03, and as of the last time that I checked, was not handled in Ruby, pcre (which is widely used in other applications), or pp-ccre.

    If someone wishes to run a test Java program to see whether it has this optimization, I'd be interested to know the result.

Java does fail pathologically
by emcmanus (Initiate) on Nov 21, 2005 at 09:24 UTC
    I tried matching "^(\\s*foo\\s*)*$" against 39 "foo "s followed by a "fo" and java.util.regex.Pattern.matches does indeed go off into recursive space never to re-emerge.

    Personally I think it would be acceptable for Java to throw an exception after some bounded amount of effort in matching. Especially given that Perl's new algorithm can't catch all the pathological cases.