Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^4: Regexes: finding ALL matches (including overlap)

by tlm (Prior)
on Jun 04, 2005 at 13:45 UTC ( [id://463525]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Regexes: finding ALL matches (including overlap)
in thread Regexes: finding ALL matches (including overlap)

sub match_all_ways { my ($string, $regex) = @_; my $count; my $incr = qr/(?{$count++})/; $string =~ /(?:$regex)$incr(?!)/; return $count; } print match_all_ways("abcdef", qr/..*..*./); # 20 print match_all_ways("abcdef", qr/..*..*./); # undef

It's because the qr// object is compiled just once and always refers to the first instance of $count. If you call this sub more than once, you will always get undef.

I see what you mean by lexicals closured in regexes not behaving as one would expect. I would have expected the second print to produce 40 instead of undef (i.e. I would have expected $count to behave like a C static variable, as is the case for "regular" closures). Is there any way to rationalize the actual behavior without diving too deeply into the Perl internals? (I ask because without some rationalization for such an odd behavior there is little chance I will remember it.)

the lowliest monk

Replies are listed 'Best First'.
Re^5: Regexes: finding ALL matches (including overlap) (its a bug)
by demerphq (Chancellor) on Jun 04, 2005 at 16:47 UTC

    Is there any way to rationalize the actual behavior without diving too deeply into the Perl internals?

    Its very simple: it's a bug. (In an experimental feature.) Basically the way perls regex engine handles embedded code is subtly wrong in a number of ways. One aspect of this is that you should never use lexicals inside of regexes inside of a repeatable scope (such as the body of a loop or a subroutine). If you are doing a one off it will probably work as you expect, but as soon as you stick the code in a subroutine or something like that and call it twice things dont work out properly. The simple workaround as blokhead explained is to use package level variables and local.

    I beleive dave_the_m has intentions of fixing this one day. But until then pay careful attention to the fact that embedded code is advertised as provisional and experimental which means that you can't really cry too much when it breaks.

    ---
    $world=~s/war/peace/g

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://463525]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-04-19 13:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found