Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^4: Problems searching and highlighting proximity words in a text

by jrc (Initiate)
on May 24, 2010 at 11:30 UTC ( #841370=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Problems searching and highlighting proximity words in a text
in thread Problems searching and highlighting proximity words in a text

Thanks for your solutions seems to work in that example and also and more I try. The $4 seems not to be necessary, at least in my case returns only three results. An example code that works with your suggestions:

#!/usr/bin/perl use strict; use warnings; use POSIX qw(locale_h); my $old_locale = setlocale(LC_CTYPE); setlocale(LC_CTYPE, 'ca_ES.iso885915@euro'); use locale; my @expressions; my @contents; my $content = qq{ Abbott test1 test2 salud }; push (@contents, $content); $content = qq{ salud test1 test2 Abbott }; push (@contents, $content); $content = qq{ Abbott test1 test2 test2 test2 test2 test2 test2 test2 test2 salud }; push (@contents, $content); $content = qq{ salud test1 test2 test2 test2 test2 test2 test2 test2 test2 Abbott }; push (@contents, $content); $content = qq{ salud test1 test2 test2 test2 test2 test2 test2 test2 test2 test2 +Abbott }; push (@contents, $content); $content = qq{ Abbott test1 test2 test2 test2 test2 test2 test2 test2 test2 test2 + salud }; push (@contents, $content); $content = qq{ salud test1 test2 test2 test2 test2 test2 test2 test2 test2 test2 +test2 test2 test2 test2 test2 test2 test2 test2 test2 test2 Abbott }; push (@contents, $content); $content = qq{ salud test1 test2 test2 test2 test2 test2 test2 test2 test2 test2 +test2 test2 test2 test2 test2 test2 test2 test2 test2 Abbott }; push (@contents, $content); $content = qq{ salud test1 test2 test2 test2 test2 test2 test2 test2 test2 test2 +test2 test2 test2 test2 test2 test2 test2 test2 test2 test2 Abbott }; push (@contents, $content); $content = qq{ salud Abbott test1 test2 salud }; push (@contents, $content); my $par1 = '[a\xe0\xe1\xe4\xe2A\xc1\xc0\xc4\xc2]bb[o\xf2\xf3\xf6\xf4O\ +xd3\xd2\xd6\xd4]tt'; my $par2 = 's[a\xe0\xe1\xe4\xe2A\xc1\xc0\xc4\xc2]l[u\xf9\xfa\xfc\xfbU\ +xda\xd9\xdc\xdb]d'; my $expression = "$par1 $par2\:\:20"; push (@expressions, $expression); warn "PART 1"; foreach my $cont (@contents){ warn "CONTENT $cont"; foreach my $exp (@expressions) { my $tag = 'span'; my $class = "lighligth"; next if ($exp !~ /::/); my ($exp, $distance) = split("::", $exp); my ($par1, $par2) = split(' ', $exp); # warn "Pars $par1 - $par2 - $distance"; if ($cont =~ /$par1.*$par2/i) { if ($cont =~ /\b($par1)(\W+(?:\w+\W+){0,$distance})($par2)\b +/i) { # if ($cont =~ m/\b($par1)(\W+(?:\w*\W*){1,$distance})?($p +ar2)\b/i){ my ($par1, $par2, $par3, $par4) = ($1, $2, $3, $4); warn "FIND 1 Par1: $par1 Par2: $par2 Part3: $par3"; $cont =~ s/$par1\Q$par2\E$par3/<$tag$class> $par1<\/$tag> +$par2<$tag$class> $par3<\/$tag>/gi; } } warn "STEP"; if ($cont =~ /$par2.*$par1/i) { if ($cont =~ /\b($par2)(\W+(?:\w+\W+){0,$distance})($par1)\b +/i) { my ($par1, $par2, $par3, $par4) = ($1, $2, $3, $4); warn "FIND 2 Par1: $par1 Par2: $par2 Part3: $par3"; $cont =~ s/$par1\Q$par2\E$par3/<$tag$class> $par1<\/$t +ag>$par2<$tag$class> $par3<\/$tag>/gi; } } } warn "END"; warn "\n\n"; }


Comment on Re^4: Problems searching and highlighting proximity words in a text
Download Code
Re^5: Problems searching and highlighting proximity words in a text
by Krambambuli (Deacon) on May 24, 2010 at 12:05 UTC
    The $4 seems not to be necessary,

    Indeed, as long as you use

    (?:\w+\W+){0,$distance}

    instead of the expression I've used,

    (\w+\W+){0,$distance}

    there will be no extra match. I haven't done any benchmarking, but probably the lookahead is a bit better/faster anyway.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://841370]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2015-07-03 22:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (57 votes), past polls