Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Work-around for variable length look-behind?

by pat_mc (Pilgrim)
on Oct 11, 2010 at 10:19 UTC ( #864571=perlquestion: print w/ replies, xml ) Need Help??
pat_mc has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Monks!

I have a regex problem today that I thought would be a quickie. I have a file with lines containing two strings separated by a whitespace. I want to make a specific global replacement, say every 'b' in front of an 'a' shall beome 'B', ONLY in the second string.
So
 bbaaccbab sdcbalsbadcnw
should become
bbaaccbab sdcBalsBadcnw
Clearly s/b(?=a)/B/g won't work because it may also effect a replacement in the first string in the line.

Of course, I could capture the second string in a preceeding match and then operate on it. What I want to achieve, however, is to express this in a single regex.

With variable length look-behind unsupported by the Perl version I use (5.8 something) I am at a loss.

Can you please help?

Cheers and thanks in advance for your advise -

Pat

Comment on Work-around for variable length look-behind?
Select or Download Code
Re: Work-around for variable length look-behind?
by Utilitarian (Vicar) on Oct 11, 2010 at 10:33 UTC
    ~/$ perl -e '$string=" bbaaccbab sdcbalsbadcnw";while($string=~s/(\S+\ +s.+?)b(?=a)/$1B/g){}print "$string\n";' bbaaccbab sdcBalsBadcnw
    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
Re: Work-around for variable length look-behind?
by Corion (Pope) on Oct 11, 2010 at 10:36 UTC

    I think the following works, at least for the limited input set. It relies on the RE engine matching the leftmost pattern first, and on your input data having a unique delimiter (the whitespace), so it doesn't fall back to the "verbatim" section of the RE:

    perl -wlpe "s/(^\w+ )|\G(b)(?=a)|\G(.)|/$1 || uc $2 || $3/ge"

    Update: I forgot about the "followed by 'a'" condition. Should work now, again.

Re: Work-around for variable length look-behind?
by BrowserUk (Pope) on Oct 11, 2010 at 10:36 UTC

    Maybe?

    $s = 'bbaaccbab sdcbalsbadcnw';; $s =~ s[(b)(?=[^\s]+$)][B]g;; bbaaccbab sdcBalsBadcnw
      Oh no ... how brilliant is this? You finally opened my eyes to the fact that non of the 'b's in the second word are succeeded by a whitespace anymore ... it is pretty obvious ... but I simply did not hit upon it myself.

      This solves my problem.
      Thanks again!

      Cheers -

      Pat
      Does that take care of the "in front of an 'a'" requirement? I like it, just trying to figure out what it does?
        Does that take care of the "in front of an 'a'" requirement?

        Er, no. I missed that bit of the spec, but it is easily corrected:

        $s = 'bbaaccbab sdbcbalsbadcbnw';; ## with added bs ($t = $s) =~ s[(b)(?=a[^\s]+$)][B]g; print $t;; bbaaccbab sdbcBalsBadcbnw

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Work-around for variable length look-behind?
by mjscott2702 (Pilgrim) on Oct 11, 2010 at 10:39 UTC
    Specific to your case, and avoiding any look-behind/ahead constructs in the regex, this seems to work?:

    $s/(\s\w+)ba/\1Ba/g;

      I was thinking of something similar but wasn't totally sure:

      s/\s+(.*)ba/$1B/g

      ... this seems to work?

      Actually, no! Let's try it.

      knoppix@Microknoppix:~$ perl -E ' > $_ = q{bbaaccbab sdcbalsbadcbnw}; > s/(\s\w+)ba/\1Ba/g; > say;' bbaaccbab sdcbalsBadcbnw knoppix@Microknoppix:~$

      Notice how it has changed the second 'ba' to the right of the space rather than the first; that's because your \w+ is greedy so it matches as many characters as it can. Correcting that and running again.

      knoppix@Microknoppix:~$ perl -E ' > $_ = q{bbaaccbab sdcbalsbadcbnw}; > s/(\s\w+?)ba/\1Ba/g; > say;' bbaaccbab sdcBalsbadcbnw knoppix@Microknoppix:~$

      Now we have changed the first 'ba' but, hang on, the second 'ba' has not been changed. That's because after the first replacement the regex engine has consumed the string up to and including the first 'ba' and is positioned in front of the next character, the 'l'. When the next match is attempted you are looking for a space followed by word characters but there is no space there so the match fails. If we try to correct that by making the space optional then the match does work more than once but the 'ba' sequences to the left of the space also get changed.

      knoppix@Microknoppix:~$ perl -E ' > $_ = q{bbaaccbab sdcbalsbadcbnw}; > s/(\s?\w+?)ba/\1Ba/g; > say;' bBaaccBab sdcBalsBadcbnw knoppix@Microknoppix:~$

      It seems, perhaps, that this problem is a bit tricky to solve without using look-around assertions.

      Cheers,

      JohnGG

Re: Work-around for variable length look-behind?
by moritz (Cardinal) on Oct 11, 2010 at 11:36 UTC
    If you use perl 5.10 or newer, you can use \K to exclude everything on its left from the subsitution:
    1 while s/ \s [^b]* \K b (?=a) /B/x; # ^ only start the substitution here # ^ go to the first b # ^ only search second string

    (Update: added the "before a" criterion)

    Perl 6 - links to (nearly) everything that is Perl 6.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://864571]
Approved by Corion
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (9)
As of 2014-12-27 19:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (177 votes), past polls