Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Match a pattern only if it is not within another pattern

by punkish (Priest)
on Aug 16, 2005 at 19:26 UTC ( [id://484219]=perlquestion: print w/replies, xml ) Need Help??

punkish has asked for the wisdom of the Perl Monks concerning the following question:

Perhaps this is a simple problem, but I've spun my wheels for a couple of hours now --

I want to match pattern in a string, but only if it is not within another pattern. For example,

$str = 'blfoo and barthisfoothatqux and barsofooquxhim andfoosom foo'; # replace all foo(s) above with 123 except for the ones that are # surrounded by bar and qux # note: there are 5 foo(s) in $str above; only 3 should be replaced # so $str becomes # 'bl123 and barthisfoothatqux and barsofooquxhim and123som 123'

Update: posted clarifications

--

when small people start casting long shadows, it is time to go to bed

Replies are listed 'Best First'.
Re: Match a pattern only if it is not within another pattern
by BrowserUk (Patriarch) on Aug 16, 2005 at 19:37 UTC

    $str = 'blfoo and barthisfoothatqux and barsofooquxhim andfoosom foo'; $str =~ s[(bar.+?qux)|(foo)][defined $2 ? '123' : $1]ge; print $str; bl123 and barthisfoothatqux and barsofooquxhim and123som 123

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      Why, I didn't even think the way of evaluation. Deconstructing --

      (bar.+?qux)|(foo) # capture anything with 'bar' and 'foo' as # bookends in $1 OR # all other 'foo' in $2 defined $2 ? '123' : $1 # if $2 exists, replace it with 123 # otherwise replace $1 back into # the string ]ge # eval globally

      Thanks. I am glad to see this was beyond my league without your help.

      Update: BrowserUK, how on earth do you even begin to think this twisted? I can't fathom how to "practice" regexp matching other than answering questions from novices such as myself. I have been scanning Friedl's book, but I guess nothing substitutes for practice at ever increasing levels of complexity, much like a video game. Well, thanks for getting me over this particular hump for now.

      --

      when small people start casting long shadows, it is time to go to bed

        how on earth do you even begin to think this twisted?

        Trying to answer other people's questions is a very powerful technique for learning a subject more deeply yourself. In our normal lives, work (or play) tends to present us with a relatively static selection of problems to solve, and internal ("nope, too ugly") and external ("the in-house style guide") forces constrain our approaches to solving them. Dealing with someone else's problem, expressed in their own words and subject to their own constraints, can shake us from the shackles of habit upon our thoughts.

        Another way to leap out of that rut is to create artificial constraints of our own. The disciplines of writing obfuscations or playing perl golf are examples of such constraints, but they are easy to create - yes, I know I could do that with a regexp in a loop, but can I do it with just a regexp and no loop? Or in one regexp instead of two? Ok, now I've done that - ugly though it is - can I think of input text that would break it? Learning stuff from books has its place, but I have always felt that something you've discovered for yourself is worth twice as much. So experiment.

        I believe there is a very close relationship between the study of pattern (which is what regular expressions are all about) and the study of mathematics. A common mantra in mathematics is: so, you have this thing to prove, and you don't know how to prove it; so first, try proving something more specific - often that is easier, and maybe it'll give you a clue how to tackle the larger task. If that doesn't work (or even if it does), try proving something more general - paradoxically, sometimes that too turns out to be easier. I think BrowserUK's solution of matching more than you asked for is conceptually quite close to "proving something more general".

        Hugo

        This is a good thing to try to remember; it can come up a lot.

        Caveat: doesn't work if you need to support nested bar/qux pairs, e.g. only replacing the first and last foo in: foo bar foo bar foo qux foo qux foo

      Should the .+ be a \w+ so as to not jump words?
      $str = 'bart is a fool qux';
      will not replace 'foo'.

      Ivan Heffner
      Sr. Software Engineer, DAS Lead
      WhitePages.com, Inc.

        I guess that depends upon whether the OP is actually using the terms 'foo', 'bar' and 'qux', or whether they are just placeholders for the purpose of his question?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      Nice, but it only works with bar then foo then qux, not qux then foo then bar. (Following passes first test, fails second test.)
      use strict; use warnings; use Test::More qw(no_plan); my $str = 'blfoo and barthisfoothatqux and barsofooquxhim andfoosom fo +o'; my $expected = 'bl123 and barthisfoothatqux and barsofooquxhim and123s +om 123'; $str =~ s[(bar.+?qux)|(foo)][defined $2 ? '123' : $1]xge; is($str,$expected); #switch qux and bar $str = 'blfoo and quxthisfoothatbar and barsofooquxhim andfoosom foo'; $expected = 'bl123 and barthisfoothatqux and barsofooquxhim and123som +123'; $str =~ s[(bar.+?qux)|(foo)][defined $2 ? '123' : $1]xge; is($str,$expected);
      I'm trying to solve the more "general" problem with parse::recdescent, further on in the thread. I gave up before finding a solution though.

        If you want to learn to solve the general problem, the book "Mastering Regular Expressions" is highly recommended. If you want a solution to the general problem, Regexp::Common::balanced does it already.

        # note, this matches "qux foo bar" and "bar foo qux", but not "bar foo + bar" # see Regexp::Common::balanced documentation for details qr/$RE{balanced}{-begin => "qux|bar"}{-end => "bar|qux"}/

        -xdg

        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

        That's "Working as designed".

        Would you expect to match ( stuff ) and ) stuff ( with the same regex? How would this be a generalisation?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re: Match a pattern only if it is not within another pattern
by Codon (Friar) on Aug 16, 2005 at 19:42 UTC
    Update: You clarified which foo's you wanted replaced while I was typing my response; Then BrowserUK responded before I finished my testing/posting. His is most impressive.

    Ivan Heffner
    Sr. Software Engineer, DAS Lead
    WhitePages.com, Inc.
Re: Match a pattern only if it is not within another pattern
by davidrw (Prior) on Aug 16, 2005 at 19:43 UTC
    while my attempts at a look-ahead/look-behind combination failed on the entire string, i was able to accomplish it by split'ing it off first...
    # $str =~ s/(?<!bar)(.*?)foo(?!.*?qux)/$1.'123'/eg; # one of severa +l failed attempts $str = join '', # glue back together map { s/(?<!bar)(.*?)foo(?!.*?qux)/$1 .'123'/eg; $_ } # replace o +n the non- bar-foo-qux elements. split /(bar.*?foo.*?qux)/, $str; # get elements that are eithe +r the bar-foo-qux form or not.
    Update: BrowserUK's approach is very much nicer than this one..
Re: Match a pattern only if it is not within another pattern
by pbeckingham (Parson) on Aug 16, 2005 at 19:48 UTC

    Use negative look-behind and negative lookahead.

    $str =~ s/((?<!bar)\S*)foo(\S*(?!qux))/${1}123${2}/gx;
    Update: Yup - it's broke. Nothing to see here, move along.



    pbeckingham - typist, perishable vertebrate.
      That gave me
      bl123 and barthis123thatqux and barso123quxhim and123som 123


      Update:
      However this seems to do what is intended, at least with the test string:
      $str =~ s/((?<!bar)\S*)foo((?!\S*qux))/${1}123${2}/gx;
      gives
      bl123 and barthisfoothatqux and barsofooquxhim and123som 123
      sorry, this doesn't work...
      $str =~ s/((?<!bar)\S*)foo(\S*(?!qux))/${1}123${2}/gx; print "$str\n";

      prints

      bl123 and barthis123thatqux and barso123quxhim and123som 123
      --

      when small people start casting long shadows, it is time to go to bed
Re: Match a pattern only if it is not within another pattern
by tphyahoo (Vicar) on Aug 17, 2005 at 09:18 UTC
    Here's my aborted attempt at trying to get this to work with parse::recdescent. Currently it doesn't work but maybe someone can supply this missing ingredients. I've got to go back to work ;)

    The RecDescent FAQ may be of help for anyone that wants to tinker with this.

    #patternInAnotherPattern.pl use strict; use warnings; use Test::More qw(no_plan); use Parse::RecDescent; my $str = 'blfoo and barthisfoothatqux and barsofooquxhim andfoosom fo +o'; my $expected = 'bl123 and barthisfoothatqux and barsofooquxhim and123s +om 123'; my $parse = Parse::RecDescent->new(q( document: chunk(s) /\Z/ { $return = join ('', @{$item[1]}) } chunk: /./ #just a placeholder, to get the grammar to return somethin +g, anything! filler_chunk: /(?!(foo|bar|qux).)*/ #inch ahead combi_chunk: boundary_chunk foo_chunk boundary_chunk boundary_chunk: /(bar|qux)((?!foo).)*/ #inch ahead foo_chunk: /foo/ bar_chunk: /bar/ qux_chunk: /qux/ )); my $res = $parse->document($str); is($res,$expected);
Re: Match a pattern only if it is not within another pattern
by tphyahoo (Vicar) on Aug 17, 2005 at 08:29 UTC
    I believe the general case for this problem would be relatively trivial to do with lookahead and lookbehind, *if* variable length negative lookbehind was supported, which PCRE currently does not. However, like so many other things, this is supposed to be fixed in perl6. It would be nice to have a Parse::Recdescent solution for this, since this is the closest thing perl5 has to perl6 rules.
Re: Match a pattern only if it is not within another pattern
by punkish (Priest) on Aug 17, 2005 at 18:35 UTC
    Thanks to all the talented monks you have contributed their wisdom on this problem. I learned a lot from this discussion. I want to explain where/how this problem stemmed from --

    As far as I understand, I can use regexp to match certain patterns in text. I can also match text that is NOT a certain pattern.

    Now, it seems the look ahead|behind negative|positive give me more power over such matches, however, my toolkit is still very pre-natal in that department.

    I want to match certain patterns, but if and only if certain other conditions are met. Here is an example. Suppose I am writing a wiki formatting module.

    # I want to match all /italics text/ and replace # it with <i>italics text</i> # except, I don't want to match the text in # [http://somewhere.com/foo/bar.html|Somewhere Else] # in other words, I don't want to end up with # [http:<i></i><i>somwhere.com</i><i>foo</i>bar.html...

    The above is just a practical application of the general problem that I was facing. So, to summarize, 'foo' 'bar' 'qux' were general placeholders, as BrowserUk correctly guessed, and to some extent, "surrounded by" 'bar' and 'qux' was interchangeable with surrounded by 'qux' and 'bar'. The more general statement would be --

    How to match something, but if and only if, certain other condition is met (or condition is not met, which, actually, is no different from 'condition is met'!).

    I need to learn a lot about conditional matching.

    --

    when small people start casting long shadows, it is time to go to bed

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://484219]
Approved by ChrisR
Front-paged by pbeckingham
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-24 22:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found