pileofrogs has asked for the wisdom of the Perl Monks concerning the following question:

This is sort of an opinion poll.

What's the best way to debug a complex regex that's not working right?

Thanks!
-Pileofrogs

Replies are listed 'Best First'.
Re: Regex Debugger?
by davido (Cardinal) on Oct 12, 2006 at 21:14 UTC

    Don't forget about such tools as YAPE::Regex::Explain, where the synopsis in the POD provides this simple example:

    use YAPE::Regex::Explain; my $exp = YAPE::Regex::Explain->new($REx)->explain;

    In the above example, let $REx be your regular expression as a single-quoted string or a qr// object.

    If you print $exp, you'll get a nice table explaining each subexpression within the regular expression.

    That's probably one of the more powerful tools for me.

    Here is an example output from YAPE::Regex::Explain, when invoked as follows:

    perl -MYAPE::Regexp::Explain -e "print YAPE::Regex::Explain->new('\bte +st(?: more)\b$')->explain();"

    And the output...

    The regular expression: (?-imsx:\btest(?: more)\b$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- test 'test' ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- more ' more' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    Perl also provides the use re qw/debug/; pragma, which can be used like this:

    #!/usr/bin/perl use re 'debug'; m/ \b test regexp \b /x;

    When you execute that you'll get a sort of compilation version of the regular expression, which is sometimes helpful in seeing what's going on (or wrong). This is discussed in perldebug and perldebuguts.


    Dave

Re: Regex Debugger?
by planetscape (Chancellor) on Oct 12, 2006 at 21:19 UTC
      With http://weitz.de/regex-coach/ you can step through your regex. There are also versions for Linux as well as an older version for Windows at the site in case the .exe above does not work.
Re: Regex Debugger?
by Hue-Bond (Priest) on Oct 12, 2006 at 20:53 UTC

    I usually resort to /x and/or commenting it and starting from scratch, beginning with something simple and adding more things until it breaks. And, of course, feed it all sorts of valid and invalid input to see how it behaves.

    Other options that come to mind are (?{ }) and use re 'debug' (if you understand its output).

    --
    David Serrano

Re: Regex Debugger?
by Khen1950fx (Canon) on Oct 12, 2006 at 21:12 UTC
Re: Regex Debugger?
by Melly (Hermit) on Oct 12, 2006 at 21:54 UTC

    Break the regex down into several regexes working on the previous regexes results.

    As I learnt today, it may well be easier to use several regexes rather than trying to be clever with one...

    Tom Melly, tom@tomandlu.co.uk
Re: Regex Debugger?
by Anonymous Monk on Oct 12, 2006 at 20:56 UTC
    My first reaction would be to use the /x flag and break the regex into multiple lines based on a semantic units. Test each smaller unit separately (maybe saving them in a $scalar), and put them all back together again.

    That, or poke around Regexp::Common and see if someone's done it for you already. Unless you're debugging Regexp::Common, that is :)

Re: Regex Debugger?
by Fletch (Chancellor) on Oct 13, 2006 at 13:51 UTC

    I like to use perl -de 0 to do regex development and debugging. Start it up and put your test data in $_, then you can just play with successive iterations of x /bl(ah|oop)|zorch?/ until you get something that works. For debugging, put the troublesome data in $_ and then try successively longer portions of your regex until you find the part that's not matching.

      As do I, I use perl -de 0 to test just about everything. I'll load modules, test them, test data structures, test certain lines of code...etc. Like I said just about everything.
Re: Regex Debugger?
by maspalio (Scribe) on Oct 13, 2006 at 07:46 UTC
    Hi,

    I usually use /x to document any complex (or yet to be complexified) RE and use Regexp::Common for common base patterns in a bottom-up building scheme.

    Agrred, this is not debugging per se but this kind of preemptive job actually saves a lot of subsequent (and sometimes painful) debugging.

    Cheers,

    Xavier
Re: Regex Debugger?
by trwww (Priest) on Oct 14, 2006 at 19:03 UTC

    Accomplishing the task is impossible without the tools others have mentioned, but the first tool I turn to is Komodo's Rx Toolkit. It is a GUI based regex sandbox.

    You need a sample data set to paste in the input text area, but once you have that Komodo provides a graphical and organizational interpretation of your regular expression as you type it.

    It is not magic (I've often got it stuck in infinite loops, but even then its better to see it there than in your actual program), but it does have some magical features to it.

    I've built regexes that a) I never would have been able to implement alone, or at least 2) came up with a more robust, less buggy regex in a fraction of the time.

    It can make an expert regex programmer exponentially more efficient, and changes the lerning curve for regex beginners from a mountain hike to a boardwalk stroll.

    After rereading this it sounds like a commercial, but it really is that good.

    Way to go ActiveState.