Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Generating regex strings with a regex

by bsb (Priest)
on Aug 27, 2003 at 08:47 UTC ( #286981=snippet: print w/ replies, xml ) Need Help??

Description: You want the list of strings which may match a regex.
/^ab?(c|d)$/ => qw( abc abd ac ad )
This isn't reusable code, just a trick I found interesting and worth exploring. See the discussion (and code!) at Regexp generating strings? for some alternatives.

The idea is to (manually) replace the atoms with code assertions, building up a hyotheical match, printing it and then failing in order to backtrack and try again.

Brad

Update: Regexp::Genex

The code for /^ab?(c|d)$/ is
perl -e '""=~/(?{a})(?{$^R.b})?((?{ $^R.c})|(?{ $^R.d}))(?{print "$^R\
+n"})(?=a)(?!a)/'
# output:
abc
abd
ac
ad
A more complex example follows, first some notes.

The above will produce a warning "Quantifier unexpected on zero-length expression..." under -w because of the (?{$^R.b})?. Suppress it with "no warnings 'regexp';"

The $^R variable contains the result of the last (?{}) assert and is localized for backtracking. (My first version had (?{ local $_ = $_ . "b" }) instead)

I tried the more complex /reg(ular\s+)?exp?(ression)?/ but ran into a few interesting, but so far surmountable problems.

Firstly, I haven't tried to tackle classes such as \s yet. I changed it to '_'. (YAPE::Regex has expanded classes)

I couldn't convince perl to repeatedly run zero-width assertions at the same position in the string. To see why search for "infinite" in perlre :). (If anyone knows how, let me know). To work around I made a string of 'a's and tacked an 'a' onto each quantified assertion. A happy side-effect is that the length of the input string bounds the quantifiers (somehow, but I ain't doing the math).

Also, the '*' quantifier is apparently optimized to not try to repeatedly match in the same position. I use {0,3} instead, as it isn't optimized. perlre again.

Greediness controls the search/display order.

I'm looking for a good impossible assertion to put at the end. Currently, (?!a)(?=a) since the obvious 'x' is also optimized. perl -Mre=debug tells why:
floating `x' at 0..2147483647 (checking floating)

I might be able to avoid the 'a's with m/.../g and resetting pos. I haven't investigated, but I doubt it.

#!/usr/bin/perl -w
#orig:  reg(ular\s+)?exp?(ression)?!*?
#mod:   ^reg(ular_+)?exp?(ression)?!{0,3}?
no warnings 'regexp'; # zero-length + quantifier

'aa' =~ /^
(?{'reg'})
(
    (?{ $^R.'ular'})
    (?: (?{ $^R."_" }) a )+
)?
(?{ $^R.'ex'})
(?{ $^R.'p'})?
((?{ $^R.'ression'}))?
(?: (?{ $^R."!" }) a ){0,3}?

(?{print "$^R\n"; })
(?=a)(?!a)  # fail
/x;
__END__
Output
$ ./genex.pl 
regular__expression
regular__exp
regular__exression
regular__ex
regular_expression
regular_expression!
regular_exp
regular_exp!
regular_exression
regular_exression!
regular_ex
regular_ex!
regexpression
regexpression!
regexpression!!
regexp
regexp!
regexp!!
regexression
regexression!
regexression!!
regex
regex!
regex!!
Comment on Generating regex strings with a regex
Select or Download Code
Replies are listed 'Best First'.
Re: Generating regex strings with a regex
by Aristotle (Chancellor) on Aug 28, 2003 at 10:13 UTC
    You can force a failure using (?!) - a lookahead assertion that the empty string doesn't match here, which can never occur.

    Makeshifts last the longest.

Back to Snippets Section

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://286981]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (8)
As of 2015-07-28 04:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (252 votes), past polls