|Problems? Is your data what you think it is?|
generating regexes?by mortis (Pilgrim)
|on Nov 19, 2001 at 20:13 UTC||Need Help??|
This is a copy of an email that I sent to phl.pm - I was prompted to additionaly post this here by Blake Mills, because he felt that perlmonks would be interested in it.
I've been doing a bit of reading on machine learning. One of the things I've been toying with is the ability to generate a regex to match a given example set of data. My particualr examples would be for things like phone numbers, or zip codes, or information that consists of single data elements.
I've looked on CPAN for any possible existing work, but haven't been able to find anything. Does anyone know of anything along the lines of what I'm describing? The Regexp package provides some common examples, but what I really want is a tool I can use to generate regexes for data in a generic, automated fashion.
I've tried writing some simplistic code, and it has some success with data that has a consistient format - though it creates some horrible looking regexes for less consistient data, and fails completely for inconsistient data. I'm almost embarassed to offer this up, but if you're interested the code I wrote to try this out is available here.
Any advice or pointers would be great.
Edited by footpad, ~Tue Nov 20 15:25:42 2001