Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

(Ovid) Re: Regular Expression matching question

by Ovid (Cardinal)
on Oct 24, 2000 at 22:46 UTC ( [id://38170]=note: print w/replies, xml ) Need Help??


in reply to Regular Expression matching question

You've found an issue that bites many programmers (including me). At first glance, the star, being greedy, slurps up everything. However, when you have an alternation, the regex will take the first successful match. Thus, the (ab)* will successfully match nothing and the regex is satisfied. If you reverse the (ab)* and (b)*, the star, being greedy, will match the "b". Try the following code:
$text = 'ab'; if ($text =~ /(a*)((?:ab)*|b*)/) { print "'$1', '$2' \n"; } if ($text =~ /(a*)(b*|(?:ab)*)/) { print "'$1', '$2' \n"; }
The output is as follows:
'a', '' 'a', 'b'
Incidentally, Perl uses a traditional NFA engine for regex matching. If it used the POSIX-NFA engine or a DFA engine, your regex would work as you expect because those engines try to find the longest match that satisfies the regex. If you have experience with those engines, Perl may cause you some confusion.

Cheers,
Ovid

P.S. I'm glad to see you have a sense of humor about the flack you took :)

Update: Oops. dchetlin is right. I was typing too fast and didn't consider the DFA issue.

Join the Perlmonks Setiathome Group or just go the the link and check out our stats.

Replies are listed 'Best First'.
(dchetlin: DFA) RE: (Ovid) Re: Regular Expression matching question
by dchetlin (Friar) on Oct 25, 2000 at 01:20 UTC
    • Incidentally, Perl uses a traditional NFA engine for regex matching. If it used the POSIX-NFA engine or a DFA engine, your regex would work as you expect because those engines try to find the longest match that satisfies the regex.

    Weeeeell, kinda. A DFA engine in general can't do backreferences, so he wouldn't have been able to test the $1 thing to begin with...

    You're right about the POSIX NFA, though. Of course, no one would use Perl RExen if they were POSIX NFAs, as they'd be too slow. But you knew that :-).

    -dlc

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://38170]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2024-04-26 06:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found