http://www.perlmonks.org?node_id=812583


in reply to Re^3: phrase match
in thread phrase match

That is useful sometimes, but here it's not needed, because a lookahead is enough.

Run this:

use warnings; $sentence='kinase inhibitor SET6 activates p16(INK4A) in cell-wall.'; my @phrases = ('kinase i', 'inhibitor', 'tor SET6', 'SET6', 'p16(INK4A +)', 'cell'); my $phrases_re = join '|', map { quotemeta } @phrases; $sentence =~ s/(^| )($phrases_re)(?= |$)/$1#$2#/g; print $sentence, "\n";

You get the output

kinase #inhibitor# #SET6# activates #p16(INK4A)# in cell-wall.

Update: There are ways to do this kind of thing without lookaheads or lookbehinds, just as a curiosity. Replace the substitution statement above with either

$sentence =~ s/(^| )($phrases_re)( |$)/$1#$2#$3/g for 0, 1;
or
use 5.010; given ($sentence) { s/ / /g; s/(^| )($phrases_re)( |$)/$1# +$2#$3/g; s/ / /g; }

Update: One more alternative is below.

my %phrase; $phrase{$_}++ for @phrases; my @sentence = split /( +)/, $sentence; for (@sentence) { $phrase{$_} and $_ = "#" . $_ . "#"; }; $sentence = join "", @sentence;

Update: Oh, let's not forget this one either.

$sentence =~ s/(?<![^ ])($phrases_re)(?= |$)/#$1#/g;