Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

[bugs?] perldoc perlre, \G and pos()

by LanX (Saint)
on Sep 29, 2009 at 12:50 UTC ( [id://798113]=perlquestion: print w/replies, xml ) Need Help??

LanX has asked for the wisdom of the Perl Monks concerning the following question:

IMHO this code example in perlre#Assertions is wrong or am I missing something?

Note that the rule for zero-length matches is modified somewhat, in that contents to the left of \G is not counted when determining the length of the match. Thus the following will not match forever:

$str = 'ABC'; pos($str) = 1; while (/.\G/g) { print $&; }
It will print 'A' and then terminate, as it considers the match to be zero-width, and thus will not match at the same position twice in a row.

The while loop tries to match $_ not $str.

so this works like it should

use warnings; $str = 'ABC'; pos($str) = 1; while ($str=~/.\G/g) { print $&; }

BUT what's really confusing me is that pos($str) is empty afterwards!

Now appending this code to the latter

print pos($str); #line6 while ($str=~/.\G/g) { print $&; }

produces  Use of uninitialized value in print at ... line 6. and an endless loop!

OK ... now at the same time running this code:

use warnings; $str = 'ABC'; print pos($str); # pos($str) = 1; while ($str=~/.\G/g) { print $&; }

produces

Use of uninitialized value in print at /home/lanx/B/PL/PM/re_escG.pl l +ine 2. Use of uninitialized value $& in print at /home/lanx/B/PL/PM/re_escG.p +l line 5.
but no endless loop!

Sorry???

if pos() is uninitialized why does it produce in one case an endless loop and in another it doesn't?

Cheers Rolf

UPDATE: Yes, I know that:

Currently, the "\G" anchor is only fully supported when used to anchor to the start of the pattern.(perlretut)
... but pos() is defined to be the position after the last match, thus in our case with a final \G in the pattern it shouldn't change at all... IMHO there's no reason why this shouldn't work... and most probably this bug in pos is the source of the problem!

Replies are listed 'Best First'.
Re: [bugs?] perldoc perlre, \G and pos()
by ikegami (Patriarch) on Sep 29, 2009 at 14:42 UTC

    BUT what's really confusing me is that pos($str) is empty afterwards!

    pos is updated by every search. It is either advanced on success, or reset on a match failure (unless you use /c). If it didn't reset, a match somewhere in the program could affect an unrelated match elsewhere in the program.

    produces [...] and an endless loop!

    Position zero is the start of the string. It doesn't surprise me that it thinks it hasn't matched yet.

    It lets you do something silly like /.\G/ assuming you know what you are doing. Expect problems if you break that trust by trying to match the character before the start of the string and nothing else.

      If you look closely at the code I posted you will see that position zero (or "uninitialized") produces inconsistent results.

      It's not that I care so much which result it produces, as long as they are consistent!

      But depending on a magically hidden memory or state is for sure a profound error.

      Cheers Rolf

      UPDATE:

      Position zero is the start of the string. It doesn't surprise me that it thinks it hasn't matched yet.

      An endless loop could only be a result of infinitely repeating matches not of the opposite. And following the documentation I quoted, it should always return the length of the match after \G, which is clearly zero, so no need for an endless loop.

        Like I said, GIGO. You're trying to make it start and end at pos -1

        An endless loop could only be a result of infinitely repeating matches not of the opposite.

        I said it *thinks* it hasn't matched yet. A reasonable belief when pos == 0.

        It's not that I care so much which result it produces, as long as they are consistent!
        I much rather have bugs that produce inconsistent results than consistent results. If it produces consistent results, there will be code that relies on it, and it will (politically) harder to fix the bug. If the results were inconsistent anyway, fixing the bug is very unlikely to break existing code.
        Like I said, GIGO. You're trying to make it start and end at pos -1

        An endless loop could only be a result of infinitely repeating matches not of the opposite.

        I said it *thinks* it hasn't matched yet. A reasonable belief when pos == 0.

Re: [bugs?] perldoc perlre, \G and pos() (and /c)
by LanX (Saint) on Sep 29, 2009 at 13:46 UTC
    using the /c modifier (for not resetting pos if matching fails, see perlretut#Global matching) avoids an endless loop, but has strange effects, too:

    This code

    use warnings; $\="\n"; $str = 'ABC'; print "-"x10; print pos($str); pos($str) = 1; while ($str=~/.\G/gc) { print $&; } print pos($str); #__DATA__ # pos($str) = pos($str); while ($str=~/.\G/gc) { print $&; } print "-"x10;

    prints:

    ---------- Use of uninitialized value in print at /home/lanx/B/PL/PM/re_escG.pl l +ine 5. A 1 ----------

    but uncommenting line pos($str) = pos($str); prints

    ---------- Use of uninitialized value in print at /home/lanx/B/PL/PM/re_escG.pl l +ine 5. A 1 A ----------

    weird!

    Cheers Rolf

Re: [bugs?] perldoc perlre, \G and pos() (Who cares?)
by Anonymous Monk on Sep 30, 2009 at 14:18 UTC
    This is a purely academic discussion, nobody ever needs this feature!
      nobody ever needs this feature!

      I do!

      I wrote a (heuristic-driven) Perlish syntax parser and transformer in Emacs Lisp, and though Perl as a language is incomparably friendlier than Lisps, I would not be even able of thinking about rewriting this tool in Perl:
      --- Ilya Zakharevich, published on Perl.com 09.2000

      also worth a look: Text Processing: Elisp vs Perl

      For all these features you need a reliable possibility to search backwards from the last match position, that's what this thread is about.

      Otherwise feel free to show me alternatives...

      Cheers Rolf

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://798113]
Approved by johngg
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2024-04-19 22:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found