http://www.perlmonks.org?node_id=812562


in reply to Re: phrase match
in thread phrase match

That has two problems:

1) Because you don't capture the space-or-start/end-of-line, the result will be missing some spaces:

kinase inhibitor#SET6#activates#p16(INK4A)#in cell-wall.
This can be fixed by using something like
$sentence =~ s/(^| )($phrases_re)( |$)/$1\#$2\#$3/g;

2) Because the spaces are part of the match, it won't be able to match patterns if they're consecutive in the source string. i.e. if you add 'activates' to the list of phrases, it won't notice it because the space preceding it has been eaten by the match for SET6. Solving this probably involves some simple lookahead/lookbehind logic to grab the spaces instead of actually matching them, but I've never been good at those so I don't have the actual regex for it.