Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

"While" effect on regex?

by lightoverhead (Pilgrim)
on Feb 16, 2013 at 23:52 UTC ( #1019084=perlquestion: print w/replies, xml ) Need Help??
lightoverhead has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have questions regarding the "while" effect on regular expression matching process. Below is the code:

use v5.14; while(<>) { while( m/ \b (\w\S+) ( \s+ \1 ) + \b /xi #if add modifier "g", it works fine. ) { say "dup word '$1' at paragraph $."; } }

My questions are:

1. This code will produce the first matched duplicate word infinitely. It seems that the second "while" evaluate the condition of regex matching unlimited times. Why doese this happen? Does the matching only happen once?

2. But if I add "g" modifier to the regex matching, it will accurately produce all the matched strings once and stop. Why does this happen?

3. What is the effect of second "while"? when a "while" used in a regex pattern matching statement, what is the behavior of "while",or how many times will "while" evaluate regex conditions, especially with or without modifier "g"?

Thank you.

Replies are listed 'Best First'.
Re: "While" effect on regex?
by roboticus (Chancellor) on Feb 17, 2013 at 00:01 UTC


    Without the 'g' modifier, the m/.../ operator simply reports whether the string matches or not. Unless you change the string, it will *always* match or *always* fail. If you read perlre, it'll refer you to perlretut. In there it says:

    Global matching

    The final two modifiers "//g" and "//c" concern multiple matches. The modifier "//g" stands for global matching and allows the matching operator to match within a string as many times as possible. In scalar context, successive invocations against a string will have `"//g" jump from match to match, keeping track of position in the string as it goes along. You can get or set the position with the "pos()" function.

    The relevent bit is the italicized bit. It's telling you that with the g operator, the match remembers where it left off, and the next time you match with the g operator, it will proceed from that point.

    Long story short: that's what the 'g' modifier does!


    When your only tool is a hammer, all problems look like your thumb.


      Your answer is very clear! Thanks a lot!

Re: "While" effect on regex?
by 7stud (Deacon) on Feb 17, 2013 at 01:51 UTC

    1. This code will produce the first matched duplicate word infinitely. It seems that the second "while" evaluate the condition of regex matching unlimited times. Why doese this happen? Does the matching only happen once?

    In scalar context, the match operator returns 0 (no match) or 1 (match). The conditional for a while loop is a boolean context, and a boolean context is considered a scalar context because a boolean context demands either a number or a string, which are both scalars. Anything else is converted to a number or string, and then the number or string is evaluated as being true or false.

    Any function (or operator) in your code is replaced by the function's return value. So in your while loop conditional, the match operator will return 1 if it finds a match, and therefore your while loop becomes:

    while (1) { ... }

    Also, because you have a capture group in your regex, the match operator sets $1. The end result is this loop:

    while (1) { say $1; }

    The m// operator only looks for the first match and then quits:

    use strict; use warnings; use 5.012; if ('aXaYaZ' =~ /(a[XYZ])/ ) { say $1; } --output:-- aX
    my @matches = 'aXaYaZ' =~ /(a[XYZ])/g; say "@matches"; --output:-- aX
    The /g flag which stands for 'global matching' changes that behavior.

      why is the second while loop needed ?
      while(<>) { while( m/\b (\w\S+) ....
      and not just ?
      while(<>) { if ( m/\b (\w\S+) ...
        The first while loop reads in the data to be matched. The second while loop goes through all the matches (if the g modifier is added).


        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1019084]
Approved by davido
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2017-03-29 09:32 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (344 votes). Check out past polls.