Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Edit distance between regular expressions

by hawtin (Prior)
on Sep 08, 2008 at 08:35 UTC ( #709727=note: print w/ replies, xml ) Need Help??


in reply to Edit distance between regular expression

First of all I think you actually want the distance from the regex to a fixed string, not between two regex. This is possibly a (slightly) easier problem.

Secondly I suspect that the problem (at least for regex but maybe not for glob) is undecidable. Consider the matchs:

/(ab|ac|ce).{3}(ge|ef|gh)/ aceabdef ageabdxf

Clearly the regex matches the first string (i.e a distance of 0), but for the second you could validly be asking "How many modifications to the regex are required to create a match" or "How many inserts/deletes are needed on the string for the regex to match". These are of course different questions.

If you think of the pattern space of all possible strings then each string is a single location while a regex is an area. It is quite legitimate to ask for the distance between two points, and that is easy to calculate. It is also possible to ask if a region encloses a point, that is does a string match a regex. The question you are asking is what is the distance from the nearest point in an area. Simple changes in the regex can have large effects on the area covered. The power of regex makes your question hard.


Comment on Re: Edit distance between regular expressions
Download Code
Re^2: Edit distance between regular expressions
by moritz (Cardinal) on Sep 08, 2008 at 08:42 UTC
    Secondly I suspect that the problem (at least for regex but maybe not for glob) is undecidable.

    I don't think so, at least in the computer science sense of "undecidable". For every perl regex (that doesn't do evil stuff with code assertions) you can make it match any string with this transformation:

    m/($old_regex)?/

    (Matches the empty string and thus every possible string).

    Which is an edit distance of 3, if we count on a character base. So all you have to check is every possible regex with an edit distance of 1 or 2, which are still very many but finite. (Assuming this kind of transformation is actually allowed)

    I don't know if that leads to any kind of practical solution though, and I don't know if OP actually talks about regexes or globs.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://709727]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2014-08-30 15:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (293 votes), past polls