Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: Tokenizing and qr// <=> /g interplay

by MarkusLaker (Beadle)
on Apr 23, 2005 at 16:34 UTC ( [id://450732]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Tokenizing and qr// <=> /g interplay
in thread Tokenizing and qr// <=> /g interplay

Another use for qr// is to break up unmanageably complex regular expressions into simpler, named, self-contained pieces. (There's a direct parallel here with subs, which do the same for 'ordinary' Perl code. In fact, you can consider a named regex to be just a function written with a funny-looking syntax: its input is a string and its output is either a Boolean value or one or more strings, depending on whether it captures anything.)

Here's an example from a code-filtering assertions module (yes, another one) that's not yet tested thoroughly enough to submit to CPAN:

# A set of regexen to match balanced text in round, square or # curly brackets: sub makerx(); my $rxround = qr/ \( (?: (?> [^()] + ) | (??{ makerx }) ) * \) /ox; my $rxsquare = qr/ \[ (?: (?> [^\[\]] + ) | (??{ makerx }) ) * \] /ox; my $rxcurly = qr/ \{ (?: (?> [^{}] + ) | (??{ makerx }) ) * \} /ox; my $rxbalanced = qr/ $rxround | $rxsquare | $rxcurly /ox; sub makerx() { $rxbalanced; } # A regex to match a term in an 'assert' statement: # balanced text in some kind of bracket, or any text other than a comm +a or semicolon: my $rxterm = qr/ (?: $rxbalanced | (?> [^,;\(\{\x5B] +? # \x5B is a synonym for '[', w +hich confuses Kate's syntax-colouring :-( ) | 0 # Special case for 0 -- why is + this needed? ) +? /ox; # A regex to match one of the tokens that mark the end of an 'assert' +statement: my $rxend = qr/ ; | } | \b (?: if | unless | while | until | for ) \b /x; # A regex to match an entire 'assert' statement and its arguments # and to collect the arguments at the same time. # Unfortunately, constructs like /($foo)+/ match all instances of $foo + but only # capture the last one, and so we have to to devious things with embed +ded Perl # in order to both match and capture all arguments to the assertion in + a single # regex. my ($group, @args, $end); my $rxassert = qr/ (?{ $group = '', @args = () }) # Wipe our state so that, if t +he regex gives up # half-way through, the next a +ttempt doesn't # inherit a lot of spurious to +sh. (?> \b assert \b \s* # Match the 'assert' keyword. ) (?: (?> : \s* (\w+) \b \s* # Look for ':SOMEGROUP' (?{ $group = $^N }) # and save it if found. ) ) ? (?: ( $rxterm ) # Look for an argument to the +assertion, (?= \s* , ) # ensure that it's followed by + a comma before we save it, (?{ push @args, $^N }) # now save it, \s* , \s* # and then skip the comma that + we already know to be there. ) * # There can be zero or more te +rms that are followed by commas. ( $rxterm ) # Look for the final argument, (?= \s* $rxend ) # ensure that it's followed by + a terminator before we save it, (?{ push @args, $^N }) # and save it. \s* ( $rxend ) # Finally, save the terminator +. (?{ $end = $^N }) /sox;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://450732]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2025-06-23 09:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.