Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Look-Arounds in Regexes are Hard

by papidave (Monk)
on Jul 27, 2009 at 20:57 UTC ( #783665=note: print w/ replies, xml ) Need Help??


in reply to Look-Arounds in Regexes are Hard

++moritz for caring about beginners.

I don't consider myself to be a newby any more, but even so, look-wherever expressions make me a bit crosseyed, and it got worse when I realized that the original question implied a fixed start and end to the pattern. When that happens, a line like

Functions <code>abc()</code> and <code>foo()</code>
should match the first code block, but not the second.

If we want to use lookahead to match <code> blocks that don't contain foo, we might use something like

m#<code>((?:(?!foo).)*)</code>#
but you'd be wise to insert a ton of comments to clarify things*. OTOH, I find it far more readable to use something like the following:
#!/usr/bin/perl -w use strict; LINE: while ( <DATA> ) { MATCH: while ( m#<code>(.*?)</code>#gi ) { print "$1\n" if ( $1 !~ /foo/ ); } } __DATA__ This has no match at all, and is skipped. This has foo, but is skipped with no code block. <code>This is acceptable</code> so is <code></code> an embedded block, and the next. <code></code> <code>is foo rejected</code> <code>foo</code> we will skip <code> with no terminator. and, we will also skip <code> with foo, but no terminator. foo that preceeds <code> just </code> simple text is accepted. here is <code> another </code> example. but <code> Perl </code> will accept foo outside the block. The foo can be in advance, as every <code> hacker </code> knows. Doubled matches like <code></code> foo <code></code> are evil. This should <code> skip the second loop</code><code> foo </code>. This <code> foo </code> should <code> skip the first loop </code>.

This gives us the desired multiple-block behaviors, while providing us with the non-foo data (in $1) for each instance, so we can do something more complicated with it if desired. In addition, the nested loop was surprisingly about 10% faster than the look-ahead code when I compared it over 100,000 iterations using Perl v5.10.0.

*Unless you're in the UK; in that case, use a tonne of comments instead. :)

Note: lost a slash in code cut/paste business - repaired.


Comment on Re: Look-Arounds in Regexes are Hard
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://783665]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (11)
As of 2015-07-02 22:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (45 votes), past polls