<?xml version="1.0" encoding="windows-1252"?>
<node id="841348" title="Re^3: Problems searching and highlighting proximity words in a text" created="2010-05-24 05:20:25" updated="2010-05-24 05:20:25">
<type id="11">
note</type>
<author id="600591">
Krambambuli</author>
<data>
<field name="doctext">
If you run your code with perl -Dr (assuming your perl interpreter is compiled with debugging enabled), you'll see what I can see now too:&lt;br&gt;&lt;br&gt;

the regexp engine works and works and works...&lt;br&gt;&lt;br&gt;

However, I cannot see yet exactly what the solution is; at first sight, the regexp seems to be only extremely inefficient via the backtracks when it does _not_ find what it looks for.&lt;br&gt;&lt;br&gt;

&lt;i&gt;Update.&lt;/i&gt;&lt;br&gt;&lt;br&gt; 

A work-around to avoid the heavy backtracking when the wanted terms are not to be found in the wanted order might look like 

&lt;code&gt;

    if ($content =~ /$par2.*$par1/i) {
        if ($content =~ /\b($par2)(\W+(?:\w*\W*){1,$distance})?($par1)\b/i){
            warn "IF 2";
            my ($par1, $par2, $par3) = ($1, $2, $3);
            $content =~ s/$par1\Q$par2\E$par3/&lt;$tag$class&gt; $par1&lt;\/$tag&gt;$par2&lt;$tag$class&gt; $par3&lt;\/$tag&gt;/gi;
        }
    }

&lt;/code&gt;

That works for me, but I guess there should be some nicer solutions too.&lt;br&gt;&lt;br&gt; 

&lt;i&gt;Update2&lt;/i&gt; Looks like using a regexp like 

&lt;code&gt;
    if ($content =~ /\b($par1)(\W+(\w+\W+){0,$distance})($par2)\b/i) {
&lt;/code&gt;

works OK and also avoids the excessive backtracking for unsuccessful lookups. You'll have however to add an $4 and use it instead of $3 for the extra new match introduced with this.&lt;br&gt;&lt;br&gt;

</field>
<field name="root_node">
841106</field>
<field name="parent_node">
841341</field>
</data>
</node>
