Re: Reverse engineering regular expressions
by Joost (Canon) on Aug 01, 2005 at 01:15 UTC
|
Interesting problem, but I wonder why you would want to solve it.
If you "just" want something to help you explain a regex, Regex Coach is helpful.
If you want to support the perl code-extensions - (?{ ... }) and friends - you can't. Update: after some more reading, lookaheads are tricky too, you're probably restricted to "classical" regular expressions + a few extensions.
Generating a "minimal" string that would match a regex (i.e. drop everything with a * modifier, drop everything after a |, take the first character of a characterset etc) should not be too hard (famous last words...) as long as you get the parser right. I suggest you take an existing one like YAPE::Regex or Regexp::Parser.
Oh, and while I was looking for those modules on cpan, I also found Regexp::Genex - it seems to do what you want. :-)
| [reply] [Watch: Dir/Any] [d/l] |
Re: Reverse engineering regular expressions
by Enlil (Parson) on Aug 01, 2005 at 01:04 UTC
|
| [reply] [Watch: Dir/Any] |
Re: Reverse engineering regular expressions
by planetscape (Chancellor) on Aug 01, 2005 at 07:55 UTC
|
The regular expression:
(?-imsx:(.*)test(.*))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
test 'test'
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
| [reply] [Watch: Dir/Any] [d/l] |
Re: Reverse engineering regular expressions
by GrandFather (Saint) on Aug 01, 2005 at 01:03 UTC
|
/$interpolated/
/(?(?{rand})this|that)/
Perl is Huffman encoded by design.
| [reply] [Watch: Dir/Any] [d/l] |
Re: Reverse engineering regular expressions
by davido (Cardinal) on Aug 01, 2005 at 04:28 UTC
|
Consider the following regular expression:
m/test./
In a purely ASCII environment (without UTF or Unicode character sets) this will match exactly 255 different strings (it would be 256 but \n won't match). For example, it will match:
- testa
- testb
- testc
- test1
- test2
- test$
- test%
- .......and so on....
But now look at your regular expression. With no other criteria beyond m/(.*)test(.*)/ you have a hopelessly large number of possibilities. Minimally, it could match "test". But it could also match any string of any size (up to the capacity of your computer's memory and swapfile) as long as the sequence 'test' is found somewhere within the string. In fact, given a random string of random length, the likelyhood of finding a match increases as the string grows, for in an infinite sequence of random characters there will exist an infinate number of embeded 'test' sequences.
The point is, there's no point to trying to generate every (or any) string that will match a given regular expression if that RE doesn't somehow limit the number of possibilities to be a manageable quantity.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Reverse engineering regular expressions
by polypompholyx (Chaplain) on Aug 01, 2005 at 09:52 UTC
|
Dominus's Higher Order Perl has a chapter on generating strings from simple regex-like specifications, but the online version isn't quite online yet.
And, since everyone is thinking it, but no-one has said it yet: perl is a binary, Perl is a language, PERL is a typo...
| [reply] [Watch: Dir/Any] |
Re: Reverse engineering regular expressions
by bsb (Priest) on Aug 01, 2005 at 12:47 UTC
|
perl -MRegexp::Genex=:all -le 'print for strings(qr/(.*)test(.*)/)'
3'e3''test
3'e3'test~
3'e3'test
3'e3testÿÌ
3'e3testÿ
3'e3test
3'etest
3'etest
3'etest
3'etest
3'testÌeÿ
3'testÌe
3'testÌe
3'testÌ
3'test
3test~
3test
test
| [reply] [Watch: Dir/Any] [d/l] |
|
But what has that gotten you? At best, a contrived minimal slice of an infinite set. Where, for example, is "asdf;laksjfd;alksjaewraefsdtest134qwefalskdfjaeraf;" (another minimal slice of infinity).
| [reply] [Watch: Dir/Any] |
|
I didn't have time to wait for that one. :)
You are right, of course. However, sometimes a slice of an
infinite set is enough, say for testing or debugging.
Also, regexes with only small characater classes and
finite quantifiers do produce useful and complete results.
Dominus' book gives you
an infinite stream of the matching strings,
ordered by length. Still quite a wait in your case.
| [reply] [Watch: Dir/Any] |
Re: Reverse engineering regular expressions
by onegative (Scribe) on Aug 01, 2005 at 17:48 UTC
|
I believe many have touched on the fact that infinite possibilities exist for any given situation unless you already had some notion of what you expected in the first place. I use the following tool to help with tricky regex expressions, not because I can't do them myself, but rather to help speed up the process. http://weitz.de/regex-coach/ | [reply] [Watch: Dir/Any] |
|
#!/usr/bin/perl -w
use strict;
use Parse::RandGen::Regexp;
my $regexp = "/^STOR\s^\n{100}/smi";
my $r = Parse::RandGen::Regexp->new($regexp);
my $string = $r->pick(match=>1, captures=>{});
print("\$string: $string\n");
This throws the following error.
Unrecognized escape \s passed through at ./regexp2.pl line 6.
%Error: Parse::RandGen::Regexp has an element that is not a Regexp reference (ref="")! at /usr/lib/perl5/site_perl/5.8.6/Parse/RandGen/Regexp.pm line 36
Parse::RandGen::Regexp::_newDerived('Parse::RandGen::Regexp=HASH(0x9dcdd88)', 'HASH(0x9e5f138)') called at /usr/lib/perl5/site_perl/5.8.6/Parse/RandGen/Condition.pm line 81
Parse::RandGen::Condition::new('Parse::RandGen::Regexp', '/^STORs^\x{a}{100}/smi') called at ./regexp2.pl line 7
Now I need the string in regexp format to pass to the function. I could just put the string in qr//s but in my real program I need to read the regexps from a list so they will come in scalar format.
i.e. How do I convert:
"/^STOR\s^\n{100}/smi";
to
qr/^STOR\s^\n{100}/smi
I'm not sure how to do this conversion.
Thanks,
P.
| [reply] [Watch: Dir/Any] |
|
use strict;
my $match='test';
my $regex_match=qr/$match/i;
my $test_value='This is a Test';
print 'It matches' if $test_value=~m/$regex_match/;
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] [Watch: Dir/Any] [d/l] [select] |