Regex Testing

by Ovid (Cardinal)
on Sep 27, 2000

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

Currently, I build regexes in small chunks, adding functionality until they work. While this works fine for general programming, it's less efficient for a regex due to my inability to easily step through them when they break. Sometimes I write some pretty hairy regexes. Two of the regexes I've recently had to create are below. Unfortunately, I can't just glance at them and understand what they do, despite the fact that I wrote them.
die "Bad number!\n" if $number !~ /(?:[\d]{1,6}\.[\d]{0,5})|(?:[\d]{0,5}\.[\d]{1,6})|( +?:[\d]{1,7})/; $count ++ if $stuff =~ /(?i:p)rods(?:\\s+(?:[^&]|&(?!gt;))+)?/;
Is there some way to watch the regex engine as it goes along? I'm thinking something similar to the Perl's debugger.

I actually use the /x modifier and comment them heavily, which helps. I've reduced them to one line above for effect.


Re: Regex Testing
by chromatic (Archbishop) on Sep 27, 2000
    You could use the re pragma:

    use re 'debug';

    It'll give you more information than you ever wanted, AND it's built-in. :)

      For what it's worth, the perldebguts manpage includes the details on the debugger's compile- and run-time outputs. You'll want to read it.

      Cool! I have always been stumped by RE's that didn't work for some stupid tyop. This pragma help me find the problem in seconds!!! There is another flavor:

      use re 'debugcolor';

      It throws in some bolding and reverse video to enhance the display.

RE: Regex Testing
by Adam (Vicar) on Sep 27, 2000
RE: Regex Testing
by meonkeys (Chaplain) on Sep 27, 2000
    If you're on Linux, you can try RegExplorer: a realtime visual regex environment.
Re: Regex Testing
by princepawn (Parson) on Sep 27, 2000
    Well, for one thing Damian Conway has written Text::Balanced to facilitate the matching of certain oft-matched patterns, ie, bracketed/parenthesized text, quoted text, tagged (XML/HTML) text, delimiter-separated, etc.

    In addition, if you want to label parts of your regexes and then piece them together, you can use Damian's Parse::RecDescent. I like it very much for this purpose. Be sure to read on the skip directive to understand the default and configurable nature of how individual regexes are pieced together.

