Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: ^x* vs x*$

by tilly (Archbishop)
on Aug 19, 2000 at 17:08 UTC ( [id://28633]=note: print w/replies, xml ) Need Help??


in reply to ^x* vs x*$

Congratulations!

You are exactly correct in your analysis, and correct to be unhappy with what Perl is doing. :-(

The first is a bug in 5.6.0. It doesn't happen in 5.005_03. It likely has been fixed by Hugo already. Anyone who wants to check that can follow my advice in Getting current versions of Perl and see if it is still there with more current patches.

The second likewise looks to me like a bug. It has been around longer though. (It appears in 5.005_03 and 5.6.0.) You match the first time and pos() is set to the end of the string. The second time you go back, start from pos() - and find that you can match at the end of the string. The first time it needs to mark that it actually matched the end of the string and not do so the next time.

At this point you should run "perlbug" with your code, and toss in my observation that the first behaved differently in Perl 5.005_03. But first I would clean it up as follows:

&re_test("x", '^x*'); &re_test("x", 'x*$'); sub re_test { my $str = shift; my $re_desc = shift; my $re = qr/$re_desc/; my @matches = ($str =~ /$re/g); my $num_match = @matches; print "Test String\t>$str<\n", "Test Regexp\t>$re_desc<\n", "Prematch\t>$`<\n", "Match\t\t>$&<\n", "Postmatch\t>$'<\n", "Num Matches\t>$NumMatch<\n", "Match Arrary:\n", map {"\t\t>$_<\n"} @matches; print "\n\n"; }
Also Jeffrey Friedl (jfriedl@yahoo-inc.com) is in the process of rewriting his Mastering Regular Expressions book and has been tracking down all of the RE bugs he can. I would toss this at him. He likely will want to check whether equivalents of the second bug appear in other RE tools.

I would do this for you, but I think it is good to encourage people to get involved in the process. :-)

Replies are listed 'Best First'.
RE: Re: ^x* vs x*$
by tye (Sage) on Aug 19, 2000 at 22:02 UTC

    When this second item has come up before, it has been defended as being the correct behavior. The more general case is that when a regex can match a zero-width string, it is possible for multiple matches to end at the same point.

    Another example is:

    $str= "ababa"; $str =~ s/a*/x/g; print "$str\n"

    which produces

    xxbxxbxx

    This is because we start at position 0 and match "a", leaving us a position 1. At position 1 we match "", leaving us at position 2 (we've already started at position 1 so we don't start there again, even though our match ended at position 1). At pos 2 we match "a", at pos 3 we match "", etc.

    But this is a bit counter intuative. In fact, sed doesn't have this "quirk". So it might be a good idea to disallow zero-width matches that start (and therefore end) at the point where the previous match ended.

    But that raises the ugly spectre of backward compatability... My current feeling is that "we" should "fix" this but provide a way to get the old behavior to ease the burdon of backward compatability (though no suitable syntax/feature for doing that springs to mind). I suspect a lack of to-its will cause the current behavior to remain until someone feels strong enough about it to champion its cause.

            - tye (but my friends call me "Tye")
      Very interesting. I can believe that happened.

      Still looks to me like a bug.

      perl -e '$str = "Hello World\n"; $str =~ s/\r?\n?$/\n/g; print $str;'
      Where did the second return come from?

      At the least after matching $ you should not match a zero-width assertion at that point again. IMHO and all that.

      I will send that bug report in shortly.

Re: ^x* vs x*$
by Abigail-II (Bishop) on Sep 18, 2003 at 15:52 UTC
    The first is a bug in 5.6.0. It doesn't happen in 5.005_03. It likely has been fixed by Hugo already. Anyone who wants to check that can follow my advice in Getting current versions of Perl and see if it is still there with more current patches.

    We're three years later. 5.8.1-RC4 and bleadperl still have this bug.

    Abigail

      And finally, four years later, I got around to filing it with p5p.
        And seven years to fix. Here's the output from a recent bleadperl...
        [steve@sulu ~]$ /tmp/bleadperl/bin/perl5.10.0 rt_31914.pl Test String >x< Prematch >< Match >x< Postmatch >< Num Matches >1< Match Arrary: >x< ---------- Test String >x< Prematch >x< Match >< Postmatch >< Num Matches >2< Match Arrary: >x< ><

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://28633]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-23 19:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found