Re: Match a pattern only if it is not within another pattern
by BrowserUk (Patriarch) on Aug 16, 2005 at 19:37 UTC
|
$str = 'blfoo and barthisfoothatqux and barsofooquxhim andfoosom foo';
$str =~ s[(bar.+?qux)|(foo)][defined $2 ? '123' : $1]ge;
print $str;
bl123 and barthisfoothatqux and barsofooquxhim and123som 123
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
| [reply] [d/l] |
|
Why, I didn't even think the way of evaluation. Deconstructing --
(bar.+?qux)|(foo) # capture anything with 'bar' and 'foo' as
# bookends in $1 OR
# all other 'foo' in $2
defined $2 ? '123' : $1 # if $2 exists, replace it with 123
# otherwise replace $1 back into
# the string
]ge # eval globally
Thanks. I am glad to see this was beyond my league without your help.
Update: BrowserUK, how on earth do you even begin to think this twisted? I can't fathom how to "practice" regexp matching other than answering questions from novices such as myself. I have been scanning Friedl's book, but I guess nothing substitutes for practice at ever increasing levels of complexity, much like a video game. Well, thanks for getting me over this particular hump for now.
--
when small people start casting long shadows, it is time to go to bed
| [reply] [d/l] |
|
how on earth do you even begin to think this twisted?
Trying to answer other people's questions is a very powerful technique for learning a subject more deeply yourself. In our normal lives, work (or play) tends to present us with a relatively static selection of problems to solve, and internal ("nope, too ugly") and external ("the in-house style guide") forces constrain our approaches to solving them. Dealing with someone else's problem, expressed in their own words and subject to their own constraints, can shake us from the shackles of habit upon our thoughts.
Another way to leap out of that rut is to create artificial constraints of our own. The disciplines of writing obfuscations or playing perl golf are examples of such constraints, but they are easy to create - yes, I know I could do that with a regexp in a loop, but can I do it with just a regexp and no loop? Or in one regexp instead of two? Ok, now I've done that - ugly though it is - can I think of input text that would break it? Learning stuff from books has its place, but I have always felt that something you've discovered for yourself is worth twice as much. So experiment.
I believe there is a very close relationship between the study of pattern (which is what regular expressions are all about) and the study of mathematics. A common mantra in mathematics is: so, you have this thing to prove, and you don't know how to prove it; so first, try proving something more specific - often that is easier, and maybe it'll give you a clue how to tackle the larger task. If that doesn't work (or even if it does), try proving something more general - paradoxically, sometimes that too turns out to be easier. I think BrowserUK's solution of matching more than you asked for is conceptually quite close to "proving something more general".
Hugo
| [reply] |
|
| [reply] [d/l] |
|
|
|
Should the .+ be a \w+ so as to not jump words?
$str = 'bart is a fool qux';
will not replace 'foo'.
Ivan Heffner
Sr. Software Engineer, DAS Lead
WhitePages.com, Inc.
| [reply] [d/l] |
|
| [reply] [d/l] |
|
Nice, but it only works with bar then foo then qux, not qux then foo then bar. (Following passes first test, fails second test.)
use strict;
use warnings;
use Test::More qw(no_plan);
my $str = 'blfoo and barthisfoothatqux and barsofooquxhim andfoosom fo
+o';
my $expected = 'bl123 and barthisfoothatqux and barsofooquxhim and123s
+om 123';
$str =~ s[(bar.+?qux)|(foo)][defined $2 ? '123' : $1]xge;
is($str,$expected);
#switch qux and bar
$str = 'blfoo and quxthisfoothatbar and barsofooquxhim andfoosom foo';
$expected = 'bl123 and barthisfoothatqux and barsofooquxhim and123som
+123';
$str =~ s[(bar.+?qux)|(foo)][defined $2 ? '123' : $1]xge;
is($str,$expected);
I'm trying to solve the more "general" problem with parse::recdescent, further on in the thread. I gave up before finding a solution though. | [reply] [d/l] |
|
If you want to learn to solve the general problem, the book "Mastering Regular Expressions" is highly recommended. If you want a solution to the general problem, Regexp::Common::balanced does it already.
# note, this matches "qux foo bar" and "bar foo qux", but not "bar foo
+ bar"
# see Regexp::Common::balanced documentation for details
qr/$RE{balanced}{-begin => "qux|bar"}{-end => "bar|qux"}/
-xdg
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
| [reply] [d/l] |
|
| [reply] [d/l] [select] |
|
|
|
Re: Match a pattern only if it is not within another pattern
by Codon (Friar) on Aug 16, 2005 at 19:42 UTC
|
| [reply] |
Re: Match a pattern only if it is not within another pattern
by davidrw (Prior) on Aug 16, 2005 at 19:43 UTC
|
while my attempts at a look-ahead/look-behind combination failed on the entire string, i was able to accomplish it by split'ing it off first...
# $str =~ s/(?<!bar)(.*?)foo(?!.*?qux)/$1.'123'/eg; # one of severa
+l failed attempts
$str =
join '', # glue back together
map { s/(?<!bar)(.*?)foo(?!.*?qux)/$1 .'123'/eg; $_ } # replace o
+n the non- bar-foo-qux elements.
split /(bar.*?foo.*?qux)/, $str; # get elements that are eithe
+r the bar-foo-qux form or not.
Update: BrowserUK's approach is very much nicer than this one.. | [reply] [d/l] |
Re: Match a pattern only if it is not within another pattern
by pbeckingham (Parson) on Aug 16, 2005 at 19:48 UTC
|
$str =~ s/((?<!bar)\S*)foo(\S*(?!qux))/${1}123${2}/gx;
Update: Yup - it's broke. Nothing to see here, move along.
pbeckingham - typist, perishable vertebrate.
| [reply] [d/l] |
|
bl123 and barthis123thatqux and barso123quxhim and123som 123
Update:
However this seems to do what is intended, at least with the test string:
$str =~ s/((?<!bar)\S*)foo((?!\S*qux))/${1}123${2}/gx;
gives
bl123 and barthisfoothatqux and barsofooquxhim and123som 123
| [reply] [d/l] [select] |
|
sorry, this doesn't work...
$str =~ s/((?<!bar)\S*)foo(\S*(?!qux))/${1}123${2}/gx;
print "$str\n";
prints
bl123 and barthis123thatqux and barso123quxhim and123som 123
--
when small people start casting long shadows, it is time to go to bed
| [reply] [d/l] [select] |
Re: Match a pattern only if it is not within another pattern
by tphyahoo (Vicar) on Aug 17, 2005 at 09:18 UTC
|
#patternInAnotherPattern.pl
use strict;
use warnings;
use Test::More qw(no_plan);
use Parse::RecDescent;
my $str = 'blfoo and barthisfoothatqux and barsofooquxhim andfoosom fo
+o';
my $expected = 'bl123 and barthisfoothatqux and barsofooquxhim and123s
+om 123';
my $parse = Parse::RecDescent->new(q(
document: chunk(s) /\Z/ { $return = join ('', @{$item[1]}) }
chunk:
/./ #just a placeholder, to get the grammar to return somethin
+g, anything!
filler_chunk:
/(?!(foo|bar|qux).)*/ #inch ahead
combi_chunk:
boundary_chunk foo_chunk boundary_chunk
boundary_chunk:
/(bar|qux)((?!foo).)*/ #inch ahead
foo_chunk:
/foo/
bar_chunk:
/bar/
qux_chunk:
/qux/
));
my $res = $parse->document($str);
is($res,$expected);
| [reply] [d/l] |
Re: Match a pattern only if it is not within another pattern
by tphyahoo (Vicar) on Aug 17, 2005 at 08:29 UTC
|
I believe the general case for this problem would be relatively trivial to do with lookahead and lookbehind, *if* variable length negative lookbehind was supported, which PCRE currently does not. However, like so many other things, this is supposed to be fixed in perl6. It would be nice to have a Parse::Recdescent solution for this, since this is the closest thing perl5 has to perl6 rules. | [reply] |
Re: Match a pattern only if it is not within another pattern
by punkish (Priest) on Aug 17, 2005 at 18:35 UTC
|
Thanks to all the talented monks you have contributed their wisdom on this problem. I learned a lot from this discussion. I want to explain where/how this problem stemmed from --
As far as I understand, I can use regexp to match certain patterns in text. I can also match text that is NOT a certain pattern.
Now, it seems the look ahead|behind negative|positive give me more power over such matches, however, my toolkit is still very pre-natal in that department.
I want to match certain patterns, but if and only if certain other conditions are met. Here is an example. Suppose I am writing a wiki formatting module.
# I want to match all /italics text/ and replace
# it with <i>italics text</i>
# except, I don't want to match the text in
# [http://somewhere.com/foo/bar.html|Somewhere Else]
# in other words, I don't want to end up with
# [http:<i></i><i>somwhere.com</i><i>foo</i>bar.html...
The above is just a practical application of the general problem that I was facing. So, to summarize, 'foo' 'bar' 'qux' were general placeholders, as BrowserUk correctly guessed, and to some extent, "surrounded by" 'bar' and 'qux' was interchangeable with surrounded by 'qux' and 'bar'. The more general statement would be --
How to match something, but if and only if, certain other condition is met (or condition is not met, which, actually, is no different from 'condition is met'!).
I need to learn a lot about conditional matching.
--
when small people start casting long shadows, it is time to go to bed
| [reply] [d/l] |