Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

What good is <CODE>\G</CODE> in a regular expression?

by faq_monk (Initiate)
on Oct 08, 1999 at 00:25 UTC ( #674=perlfaq nodetype: print w/ replies, xml ) Need Help??

Current Perl documentation can be found at perldoc.perl.org.

Here is our local, out-dated (pre-5.6) version:

The notation \G is used in a match or substitution in conjunction the /g modifier (and ignored if there's no /g) to anchor the regular expression to the point just past where the last match occurred, i.e. the pos() point.

For example, suppose you had a line of text quoted in standard mail and Usenet notation, (that is, with leading > characters), and you want change each leading > into a corresponding :. You could do so in this way:

     s/^(>+)/':' x length($1)/gem;

Or, using \G, the much simpler (and faster):

    s/\G>/:/g;

A more sophisticated use might involve a tokenizer. The following lex-like example is courtesy of Jeffrey Friedl. It did not work in 5.003 due to bugs in that release, but does work in 5.004 or better. (Note the use of /c, which prevents a failed match with /g from resetting the search position back to the beginning of the string.)

    while (<>) {
      chomp;
      PARSER: {
           m/ \G( \d+\b    )/gcx    && do { print "number: $1\n";  redo; };
           m/ \G( \w+      )/gcx    && do { print "word:   $1\n";  redo; };
           m/ \G( \s+      )/gcx    && do { print "space:  $1\n";  redo; };
           m/ \G( [^\w\d]+ )/gcx    && do { print "other:  $1\n";  redo; };
      }
    }

Of course, that could have been written as

    while (<>) {
      chomp;
      PARSER: {
           if ( /\G( \d+\b    )/gcx  {
                print "number: $1\n";
                redo PARSER;
           }
           if ( /\G( \w+      )/gcx  {
                print "word: $1\n";
                redo PARSER;
           }
           if ( /\G( \s+      )/gcx  {
                print "space: $1\n";
                redo PARSER;
           }
           if ( /\G( [^\w\d]+ )/gcx  {
                print "other: $1\n";
                redo PARSER;
           }
      }
    }

But then you lose the vertical alignment of the regular expressions.

Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (17)
As of 2015-07-02 17:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (44 votes), past polls