Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re (tilly) 1: Avoiding regex backtracking

by tilly (Archbishop)
on Mar 05, 2002 at 14:00 UTC ( [id://149367]=note: print w/replies, xml ) Need Help??


in reply to Avoiding regex backtracking

In this case there should be no win to trying to optimize. The RE engine is smart enough to find a much better optimization behind your back. (Find the fixed string through Boyer-Moore, match the RE around that optimization.)

In general while backtracking is good to know about, most of the time it is not a problem. The exceptions are cases like this:

foreach my $i (1..40) { slow_match($i); } sub slow_match { my $count = shift || 20; my $str = ("yada " x $count) . "yad"; print "Trying to match $count iterations.."; die "Huh?" if $str =~ /^(\s*yada\s*)*$/; print "Done\n"; }
As for the optimization you mention, for ones which do not use "special features" (eg backreferences within the match, lookaheads, etc) it is possible to execute any match in guaranteed time. Perl does not, however, do this...

Replies are listed 'Best First'.
Re: Re (tilly) 1: Avoiding regex backtracking
by blakem (Monsignor) on Mar 13, 2002 at 02:52 UTC
    This snippet shows very different behavior in 5.6.1 and 5.00503. In the earlier version I see the exponential execution time, but in 5.6.1 it seems to do *much* better. Do you know of any improvements in perl that might account for this?

    -Blake

      Yes.

      In 5.6 the engine keeps track of how long "complex RE's" are taking to match. If it is too long, then it redoes it while keeping a history of visited positions. This will reduce the worst case scenario from exponential to polynomial (unless, of course, you use backreferences).

      There is, however, an optimization that I know of which would turn this into straight linear behaviour. I gave a basic sketch of the idea at Re (tilly) 1: Research ideas, but to the best of my knowledge nobody has ever implemented it in any real RE engine...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://149367]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-03-28 18:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found