Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

regexp help -- word boundaries

by water (Deacon)
on Jul 09, 2005 at 10:29 UTC ( [id://473657]=perlquestion: print w/replies, xml ) Need Help??

water has asked for the wisdom of the Perl Monks concerning the following question:

Looking to improve my regexp skills -- could this set of 4 regexps be combined into one somehow?
$n =~ s/^$a$/$b/i; $n =~ s/ $a / $b /i; $n =~ s/ $a$/ $b/i; $n =~ s/^$a /$b /i;
Thanks for any advice!

Replies are listed 'Best First'.
Re: regexp help -- word boundaries
by hv (Prior) on Jul 09, 2005 at 10:51 UTC

    Sure, here's a start:

    $n =~ s/(^| )$a( |$)/$1$b$2/i;

    You can avoid the $1..$2 in the substitution if you can avoid capturing the spaces, which you can do by matching with lookahead/lookbehind:

    $n =~ s/(^|(?<= ))$a((?= )|$))/$b/i;

    Or inverse the sense of the lookarounds: "not after a non-space" rather than "after a space, or after nothing":

    $n =~ s/(?<![^ ])$a(?![%a])/$b/i;

    Note that the first version is adequate for most purposes, but may trip you up if you want to do a global substitution for all copies of $a:

    "FOO foo" =~ s/(^| )foo( |$)/${1}bar$2/ig
    .. will catch the 'FOO' but not the 'foo', because having used the space as part of the first match the regexp engine will start trying to perform a second match starting at the 'f'.

    The other two version do not suffer from that problem, since the lookarounds don't affect where a future match will start.

    Hugo

Re: regexp help -- word boundaries
by TedPride (Priest) on Jul 09, 2005 at 12:09 UTC
    I believe what you're trying to do is replace words without also replacing words which contain the original words. Or to put it another way, you want each word bracketed by word boundaries.

    Thankfully for you, there's an easy way to specify this: \b

    my $x = 'a'; my $y = 'b'; while (<DATA>) { chomp; s/\b$x\b/$y/g; print "$_\n"; } __DATA__ a a a a
    Incidently, use of $a and $b is not recommended, since these are used in sorts.
      But you can't be sure that $x starts and ends with an alphanumeric character, in which case \b$x\b wouldn't match when $x is surrounded by spaces.

      Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
      How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: regexp help -- word boundaries
by holli (Abbot) on Jul 09, 2005 at 10:51 UTC
    maybe
    $n =~ s/^( ?)$a( ?)$/$1$b$2/i;


    holli, /regexed monk/

      You and sh1tn both made a fairly common mistake on this type of problem -- of the possibilities, you made one item required, and one item optional. So, instead of matching words in the middle of a longer string of words, it'll now only match single word strings, optionally padded with spaces.

      TedPride had the method that I'd typically use, but it'll actually match much more than what water originally asked for. (it'll match if $a is near punctuation, which may result in substituting out part of a hyphenated compound word)

      To match exactly what was asked for, I'd have used the same logic as hv's first response. (I've never been comfortable with look(ahead|behind) assertions). If I needed to do multiple replacements, I'd get around his mentioned problem this method by using \G :

      $n =~ s/(^|\G| )$a($| )/$1$b$2/ig;
Re: regexp help -- word boundaries
by northwind (Hermit) on Jul 09, 2005 at 11:02 UTC

    Not the prettiest, but this should work:

    $n =~ s/(?:^|([ ]))$a(?:([ ])|$)/($1 ? $1 : "") . $b . ($2 ? $2 : "" +)/ie;
    Warning: Code is untested; note the use of "should" vs. "will".

Re: regexp help -- word boundaries
by sh1tn (Priest) on Jul 09, 2005 at 11:56 UTC
Re: regexp help -- word boundaries
by japhy (Canon) on Jul 11, 2005 at 12:11 UTC
    You want $a to be preceded by either a space or nothing at all (beginning of line), and followed by either a space or nothing at all (end of line). Preceded by a space or nothing is the same as not preceded by something that's not a space. Translated into regex, that's (?<!\S). Similarly, followed by a space or nothing is the same as not followed by something that's not a space. That is (?!\S). Thus: $n =~ s/(?<!\S)$a(?!\S)/$b/ig will work in the cases you have requested. You might be interested in using \b instead (for word boundaries), as it will work in cases where $a is surrounded by, for instance, punctuation, but it will not work in all cases if $a begins or ends with non-alphanumberscore characters.

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://473657]
Approved by holli
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-24 18:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found