regexp help -- word boundaries

water has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: regexp help -- word boundaries by hv (Prior) on Jul 09, 2005 at 10:51 UTC
Sure, here's a start: `$n =~ s/(^\| )$a( \|$)/$1$b$2/i;` [download] You can avoid the $1..$2 in the substitution if you can avoid capturing the spaces, which you can do by matching with lookahead/lookbehind: `$n =~ s/(^\|(?<= ))$a((?= )\|$))/$b/i;` [download] Or inverse the sense of the lookarounds: "not after a non-space" rather than "after a space, or after nothing": `$n =~ s/(?<![^ ])$a(?![%a])/$b/i;` [download] Note that the first version is adequate for most purposes, but may trip you up if you want to do a global substitution for all copies of `$a`: `"FOO foo" =~ s/(^\| )foo( \|$)/${1}bar$2/ig` [download] .. will catch the 'FOO' but not the 'foo', because having used the space as part of the first match the regexp engine will start trying to perform a second match starting at the 'f'. The other two version do not suffer from that problem, since the lookarounds don't affect where a future match will start. Hugo	[reply] [d/l] [select]
Re: regexp help -- word boundaries by TedPride (Priest) on Jul 09, 2005 at 12:09 UTC
I believe what you're trying to do is replace words without also replacing words which contain the original words. Or to put it another way, you want each word bracketed by word boundaries. Thankfully for you, there's an easy way to specify this: \b `my $x = 'a'; my $y = 'b'; while (<DATA>) { chomp; s/\b$x\b/$y/g; print "$_\n"; } __DATA__ a a a a` [download] Incidently, use of $a and $b is not recommended, since these are used in sorts.	[reply] [d/l]
Re^2: regexp help -- word boundaries by japhy (Canon) on Jul 11, 2005 at 12:07 UTC
But you can't be sure that $x starts and ends with an alphanumeric character, in which case `\b$x\b` wouldn't match when $x is surrounded by spaces. Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply] [d/l]
Re: regexp help -- word boundaries by holli (Abbot) on Jul 09, 2005 at 10:51 UTC
maybe `$n =~ s/^( ?)$a( ?)$/$1$b$2/i;` [download] holli, /regexed monk/	[reply] [d/l]
Re^2: regexp help -- word boundaries by jhourcle (Prior) on Jul 09, 2005 at 14:11 UTC
You and sh1tn both made a fairly common mistake on this type of problem -- of the possibilities, you made one item required, and one item optional. So, instead of matching words in the middle of a longer string of words, it'll now only match single word strings, optionally padded with spaces. TedPride had the method that I'd typically use, but it'll actually match much more than what water originally asked for. (it'll match if $a is near punctuation, which may result in substituting out part of a hyphenated compound word) To match exactly what was asked for, I'd have used the same logic as hv's first response. (I've never been comfortable with look(ahead\|behind) assertions). If I needed to do multiple replacements, I'd get around his mentioned problem this method by using \G : `$n =~ s/(^\|\G\| )$a($\| )/$1$b$2/ig;`	[reply] [d/l]
Re: regexp help -- word boundaries by northwind (Hermit) on Jul 09, 2005 at 11:02 UTC
Not the prettiest, but this should work: `$n =~ s/(?:^\|([ ]))$a(?:([ ])\|$)/($1 ? $1 : "") . $b . ($2 ? $2 : "" +)/ie;` [download] Warning: Code is untested; note the use of "should" vs. "will".	[reply] [d/l]
Re: regexp help -- word boundaries by sh1tn (Priest) on Jul 09, 2005 at 11:56 UTC
`$n =~ s/^(\s)$a(\s)$/$1$b$2/i;` [download]	[reply] [d/l]
Re: regexp help -- word boundaries by japhy (Canon) on Jul 11, 2005 at 12:11 UTC
You want $a to be preceded by either a space or nothing at all (beginning of line), and followed by either a space or nothing at all (end of line). Preceded by a space or nothing is the same as not preceded by something that's not a space. Translated into regex, that's `(?<!\S)`. Similarly, followed by a space or nothing is the same as not followed by something that's not a space. That is `(?!\S)`. Thus: `$n =~ s/(?<!\S)$a(?!\S)/$b/ig` will work in the cases you have requested. You might be interested in using `\b` instead (for word boundaries), as it will work in cases where $a is surrounded by, for instance, punctuation, but it will not work in all cases if $a begins or ends with non-alphanumberscore characters. Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply] [d/l] [select]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks