Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Work-around for variable length look-behind?

by pat_mc (Pilgrim)
on Oct 11, 2010 at 10:19 UTC ( #864571=perlquestion: print w/replies, xml ) Need Help??
pat_mc has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Monks!

I have a regex problem today that I thought would be a quickie. I have a file with lines containing two strings separated by a whitespace. I want to make a specific global replacement, say every 'b' in front of an 'a' shall beome 'B', ONLY in the second string.
 bbaaccbab sdcbalsbadcnw
should become
bbaaccbab sdcBalsBadcnw
Clearly s/b(?=a)/B/g won't work because it may also effect a replacement in the first string in the line.

Of course, I could capture the second string in a preceeding match and then operate on it. What I want to achieve, however, is to express this in a single regex.

With variable length look-behind unsupported by the Perl version I use (5.8 something) I am at a loss.

Can you please help?

Cheers and thanks in advance for your advise -


Replies are listed 'Best First'.
Re: Work-around for variable length look-behind?
by moritz (Cardinal) on Oct 11, 2010 at 11:36 UTC
    If you use perl 5.10 or newer, you can use \K to exclude everything on its left from the subsitution:
    1 while s/ \s [^b]* \K b (?=a) /B/x; # ^ only start the substitution here # ^ go to the first b # ^ only search second string

    (Update: added the "before a" criterion)

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: Work-around for variable length look-behind?
by Corion (Pope) on Oct 11, 2010 at 10:36 UTC

    I think the following works, at least for the limited input set. It relies on the RE engine matching the leftmost pattern first, and on your input data having a unique delimiter (the whitespace), so it doesn't fall back to the "verbatim" section of the RE:

    perl -wlpe "s/(^\w+ )|\G(b)(?=a)|\G(.)|/$1 || uc $2 || $3/ge"

    Update: I forgot about the "followed by 'a'" condition. Should work now, again.

Re: Work-around for variable length look-behind?
by Utilitarian (Vicar) on Oct 11, 2010 at 10:33 UTC
    ~/$ perl -e '$string=" bbaaccbab sdcbalsbadcnw";while($string=~s/(\S+\ +s.+?)b(?=a)/$1B/g){}print "$string\n";' bbaaccbab sdcBalsBadcnw
    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
Re: Work-around for variable length look-behind?
by BrowserUk (Pope) on Oct 11, 2010 at 10:36 UTC


    $s = 'bbaaccbab sdcbalsbadcnw';; $s =~ s[(b)(?=[^\s]+$)][B]g;; bbaaccbab sdcBalsBadcnw
      Oh no ... how brilliant is this? You finally opened my eyes to the fact that non of the 'b's in the second word are succeeded by a whitespace anymore ... it is pretty obvious ... but I simply did not hit upon it myself.

      This solves my problem.
      Thanks again!

      Cheers -

      Does that take care of the "in front of an 'a'" requirement? I like it, just trying to figure out what it does?
        Does that take care of the "in front of an 'a'" requirement?

        Er, no. I missed that bit of the spec, but it is easily corrected:

        $s = 'bbaaccbab sdbcbalsbadcbnw';; ## with added bs ($t = $s) =~ s[(b)(?=a[^\s]+$)][B]g; print $t;; bbaaccbab sdbcBalsBadcbnw

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Work-around for variable length look-behind?
by mjscott2702 (Pilgrim) on Oct 11, 2010 at 10:39 UTC
    Specific to your case, and avoiding any look-behind/ahead constructs in the regex, this seems to work?:


      ... this seems to work?

      Actually, no! Let's try it.

      knoppix@Microknoppix:~$ perl -E ' > $_ = q{bbaaccbab sdcbalsbadcbnw}; > s/(\s\w+)ba/\1Ba/g; > say;' bbaaccbab sdcbalsBadcbnw knoppix@Microknoppix:~$

      Notice how it has changed the second 'ba' to the right of the space rather than the first; that's because your \w+ is greedy so it matches as many characters as it can. Correcting that and running again.

      knoppix@Microknoppix:~$ perl -E ' > $_ = q{bbaaccbab sdcbalsbadcbnw}; > s/(\s\w+?)ba/\1Ba/g; > say;' bbaaccbab sdcBalsbadcbnw knoppix@Microknoppix:~$

      Now we have changed the first 'ba' but, hang on, the second 'ba' has not been changed. That's because after the first replacement the regex engine has consumed the string up to and including the first 'ba' and is positioned in front of the next character, the 'l'. When the next match is attempted you are looking for a space followed by word characters but there is no space there so the match fails. If we try to correct that by making the space optional then the match does work more than once but the 'ba' sequences to the left of the space also get changed.

      knoppix@Microknoppix:~$ perl -E ' > $_ = q{bbaaccbab sdcbalsbadcbnw}; > s/(\s?\w+?)ba/\1Ba/g; > say;' bBaaccBab sdcBalsBadcbnw knoppix@Microknoppix:~$

      It seems, perhaps, that this problem is a bit tricky to solve without using look-around assertions.



      I was thinking of something similar but wasn't totally sure:


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://864571]
Approved by Corion
Front-paged by Old_Gray_Bear
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2017-06-23 17:41 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (552 votes). Check out past polls.