Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Lookahead assertion

by talexb (Chancellor)
on Oct 17, 2013 at 03:36 UTC ( [id://1058590]=perlquestion: print w/replies, xml ) Need Help??

talexb has asked for the wisdom of the Perl Monks concerning the following question:

I've been tinkering around with regular expressions, and in particular lookahead assertions. The examples on page 248 of the Camel (fourth edition) say that this code

"0123456789" =~ /(\d{3})/g;
returns only three strings, '012', '345' and '678', since the regex engine moves past the pattern it's found each time. I tried this in the debugger, and confirmed the behaviour:
main::(-e:1): 1 DB<1> $foo = "123456790" DB<2> @w1 = ( $foo =~ /(\d{3})/g ); DB<3> x @w1 0 123 1 456 2 790

Now, I'm interested in a regex that also finds '123' and '234' -- I want to tell the regex engine to reset to where it found the last match, plus one character. The Camel appears to say that the next example does this, with the lookahead assertion as follows:

"0123456789" =~ /(?:(\d{3}))/g;
Brimming with energy and thrilled at my discovery, I try that out in the debugger:
DB<4> @w2 = ( $foo =~ /(?:(\d{3}))/g ); DB<5> x @w2 0 123 1 456 2 790
Nope. Can someone suggest what I've done wrong?

Update: As noted below, this is a typo, as recorded on the errata page for this book. I'll be going through the rest of the errata and making updates to my Programming Perl (4th edition) later today.

Alex / talexb / Toronto

Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Replies are listed 'Best First'.
Re: Lookahead assertion
by Athanasius (Archbishop) on Oct 17, 2013 at 03:48 UTC

    Look-ahead is (?= ), not (?: ) !

    13:46 >perl -wMstrict -MData::Dump -E "my $foo = '123456789'; my @w1 = + ($foo =~ /(?=(\d{3}))/g); dd \@w1;" [123, 234, 345, 456, 567, 678, 789] 13:46 >

    Update: The use of (?: ) instead of (?= ) appears to be a misprint in the Camel Book.

    The following explanation from that same section (page 248) is worth noting (at least, it helped me to understand what is going on):

    When the engine sees that it should try again because of the /g, it steps one character past where last it tried.

    This explains why the final . in BrowserUk’s solution is not strictly necessary. But BrowserUk’s regex can be usefully generalised to step forward an arbitrary number of characters. For example, to capture 3 digits and then step forward 2 characters:

    17:56 >perl -wMstrict -MData::Dump -E "my $foo = '123456789012345'; my + @w1 = ($foo =~ /(?=(\d{3})).{2}/g); dd \@w1;" [123, 345, 567, 789, 901, 123, 345] 18:09 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      When the engine sees that it should try again because of the /g, it steps one character past where last it tried.

      This explains why the final . in BrowserUk’s solution is not strictly necessary

      Indeed. The final '.' is not necessary with Perl's regex engine.

      But many other regex libraries -- including some that claim to be Perl-compatible -- do not have this pragmatic deviation from the classical regular expression operation; hence it has become a habit with me to do this explicitly.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Thanks for your reply .. and I checked, the final dot makes no difference in the result ..

        DB<6> @w3 = ( $foo =~ /(?=(\d{3}))/g ); DB<7> x @w3 0 123 1 234 2 345 3 456 4 567 5 679 6 790 DB<8> @w3 = ( $foo =~ /(?=(\d{3}))./g ); DB<9> !7 x @w3 0 123 1 234 2 345 3 456 4 567 5 679 6 790
        Good to know.

        Alex / talexb / Toronto

        Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      Thanks for your response -- I posted this problem last thing at night, then went to bed. When I woke up this morning, my mind returned to this problem, and I thought, "That's ridiculous -- could it be a mis-print?"

      DB<6> @w3 = ( $foo =~ /(?=(\d{3}))/g ); DB<7> x @w3 0 123 1 234 2 345 3 456 4 567 5 679 6 790
      Yep. A ':' where it should have been a '='. Guess I better check to see what other errata there are.

      Alex / talexb / Toronto

      Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Lookahead assertion
by BrowserUk (Patriarch) on Oct 17, 2013 at 03:53 UTC

    Read as: "lookahead (and capture) 3 digits; then step forward 1 anything".

    $s = '1234567890';; print $1 while $s =~ m[(?=(\d{3})).]g;; 123 234 345 456 567 678 789 890

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1058590]
Approved by Athanasius
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (2)
As of 2024-04-19 19:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found