Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

look-behind regex

by zejames (Hermit)
on Jul 24, 2002 at 05:27 UTC ( [id://184706]=perlquestion: print w/replies, xml ) Need Help??

zejames has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

Here is my problem : I want to substitute a word, 'foo', by another word 'bar' in a line, but only if that line does not contain 'toto' before matching 'foo'.

Of course, I can do this :
while (<DATA>) { s/foo/bar/ if ($_ !~ /toto.*?foo/); print; } __DATA__ toto 4dsf4qsd foo mama 432fz foo
Output:
toto 4dsf4qsd foo mamaf 432fz bar

But what I am looking for is an only regex, probably using look-behing assertion. I have played with (?<!pattern), without success. As far as I understand, the problem is that there can be anything between 'toto' and 'foo'.

Any hint ?

--
zejames

Replies are listed 'Best First'.
Re: look-behind regex
by Abigail-II (Bishop) on Jul 24, 2002 at 09:50 UTC
    While Tom Christiansen was working on the Cookbook, he and I were once musing how to write a regex that matches strings that doesn't contain a certain pattern. We came up with:
    /^(?:(?!pattern).)*$/s;
    which is pretty straight forward if you realize how regular expressions are matched.

    For your problem, this leads to:

    s/^((?:(?!toto).)*?)foo/${1}bar/s;
    And if Japhy's new \K assertions gets accepted, you will be able to write it as:
    s/^((?:(?!toto).)*?)\Kfoo/bar/s;
    Abigail
      The problem I have with /(?:(?!foo).)*/ is its slowness. Update: I'm wrong. Apparently, this method is usually the fastest.

      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

        In this particular case there's possibly a faster method though:

        s/^(?!(?>.*?toto.*?foo))(.*?)foo/${1}bar/s;

        If toto or foo can be found in the near beginning, Abigail's pattern is a winner. This is, I guess, due to the overhead of the (?>) assertion. But the farther away toto or foo is the worse it gets for the step-by-step method (/(?:(?!foo).)*/), and the (?>) pattern gets faster. (This isn't just a theoretical win. Even though the overhead due to (?>), that pattern might very well win in a real-life situation.)

        But if I shall look at the big picture for a while though; in this case I find it hard to believe that any pattern can beat the plain and simple

        s/foo/bar/ unless /toto.*?foo/s;

        Even silly constructions like

        s/foo(?(?{index($`, 'toto') != -1})(?!))/bar/;

        is faster than the more generic solutions.

        Cheers,
        -Anomo
Re: look-behind regex
by DamnDirtyApe (Curate) on Jul 24, 2002 at 05:46 UTC

    A major restriction of look-behind assertions is that the pattern looked for must be of a constant width. I'm not sure how you'd work that into a substitution regex. The solution you have here seems simple and ought to do what you need; is there a reason you need to reduce this to one regex?

    If you haven't seen it already, take a look at japhy's regexp book.


    _______________
    D a m n D i r t y A p e
    Home Node | Email
Re: look-behind regex
by Courage (Parson) on Jul 24, 2002 at 05:55 UTC
    following not-lookbehind regex:
    s/^(.*?)foo/my $x=$1; $x.($x=~m[toto]?"foo":"bar")/e
    does the work, but does it meets your requirements?

    Courage, the Cowardly Dog

Re: look-behind regex
by I0 (Priest) on Jul 24, 2002 at 23:49 UTC
    s/(toto)|foo/${[$1,'bar']}[!$1]/
      Don't be afraid to use the e modifier: s/(toto)|foo/defined $1 ? $1 : 'bar'/e Cheers,
      -Anomo
        How about s/(toto)|foo/$1||'bar'/e
Re: look-behind regex
by flocto (Pilgrim) on Jul 24, 2002 at 10:22 UTC

    With your matching regex, you match foo and bar in a rather unelegant way. This is only needed, if you have to assure that toto and foo and in this order. This is unneccessary if

    • that doesn't matter (in this case you could call it a bug, rather than a feature..)
    • you can say: "If there are toto and foo in one line, thay are always in this order!"

    In either one of this two cases, you can just use the following, which should be a lot faster (it propably doesn't make a difference with such short strings, but imagine the backtracking you get when trying to have this regex match a 1000 character long string..):

    while (<DATA>) { unless (m/toto/) { s/foo/bar/g; } print; }

    Once again: If you have only short strings like in your example you can happily live with your solution, if you have to ensure this order you propably have to. It's just unneccessary to match something that you don't need..

    Doing all this in just one regex may be possible, but that should rather be used to write Obfuscated Code.

    Regards,
    -octo

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://184706]
Approved by grep
Front-paged by hsmyers
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-03-28 14:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found