Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

regexp matching around a word

by kidd (Curate)
on May 14, 2005 at 02:19 UTC ( #456959=perlquestion: print w/ replies, xml ) Need Help??
kidd has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, im in seek of your wisdom once again.

I have this string:

my $text = "This is a test for the word blue more things blue and then + some very large text with a lot of things blue and done";
In this string there are several instances of the word blue. What im trying to do is fetch the text around the words "blue". So i have this regexp:
my $WANT = "blue"; my @results = $text =~ m/(?:\w+\s+){0,5} (?:$WANT) (?:\s+\w+){0,5} /xgi;
Basically what it does it finds from 0 to 5 words before the word "blue" and the same after the word "blue". This gives me the next results:
a test for the word blue more things blue and then with a lot of things blue and done
But then I realized that in the first result line a word "blue" appears on the 5 words after the word "blue".

So I thought that maybe I could change the regexp to find the 5 words after the word "blue" in case the word "blue" is in the words around "blue".

Maybe I could give an example, my desired result would be:

a test for the word blue more things blue and then some very large with a lot of things blue and done
What I tried without sucess was this:
my $WANT = "blue"; my @results = $text =~ m/(?:\w+\s+){0,5} (?:$WANT) (?:\s+\w+){0,5} (?: \s+(?:$WANT) (?:\s+\w+){0,5} )? /xgi;
I hope you can understand my problem.

Thanks in advanced for any help.

If your a spanish spoken programmer go to my site: Perl en Espaņol

Comment on regexp matching around a word
Select or Download Code
Re: regexp matching around a word
by Zaxo (Archbishop) on May 14, 2005 at 02:57 UTC

    Most of your regex-fu is fine. The last, unsuccessful, one fails because it still matches any five words before looking for $WANT again. You could try a conditional second match to find the target string in the tail of the first match.

    Another approach would be to split the text and grep for indexes of $WANT. Then produce slices of eleven elements around the positions you find and join with space.

    my $text = "This is a test for the word blue more things blue and then + some very large text with a lot of things blue and done"; my @words = do { local $_ = $text; split; }; my $WANT = qr/\bblue\b/i; my @wants = grep { $words[$_] =~ $WANT } 0 .. $#words; for (@wants) { my $first = ($_ - 5 < 0) ? 0 : $_ - 5; my $last = ($_ + 5 > $#words) ? $#words : $_ + 5; print join( ' ', @words[$first .. $last]), $/; } __END__ a test for the word blue more things blue and then the word blue more things blue and then some very large with a lot of things blue and done

    After Compline,
    Zaxo

Re: regexp matching around a word
by saintmike (Vicar) on May 14, 2005 at 03:02 UTC
    While you might find a solution by just sticking to regexes for the problem shown, it's typically less painful to handle the parse states in perl code. Not the world's sexiest code, but works:
    my $text = "This is a test for the word blue more things blue and then + some very large text with a lot of things blue and done"; my $words_to_go = 0; my @keeper = (); while($text =~ /(\w+)/g) { if($1 eq "blue") { if(! $words_to_go) { print "$_ " for @keeper; } print "$1 "; $words_to_go = 5; } elsif($words_to_go) { $words_to_go--; print "$1 "; } else { push @keeper, $1; shift @keeper if @keeper > 5; } }
Re: regexp matching around a word
by dragonchild (Archbishop) on May 14, 2005 at 03:40 UTC
    Don't use a regex; use split().
    my @words = split /\s+$WANT\s+/, $text;

    Now, you have the text on either side of "blue". Use my @other_words = split ' ', $foo; to get at the other words.

    A regex is overkill and very unmaintainable. Keep it simple!


    • In general, if you think something isn't in Perl, try it out, because it usually is. :-)
    • "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"
Re: regexp matching around a word
by sh1tn (Priest) on May 14, 2005 at 09:53 UTC
    push @results, [$1,$2] while $text =~ /((?:\w+\s+){0,5})blue((?:\s+\w+){0,5})/g;


Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://456959]
Approved by moot
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2014-07-26 17:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (178 votes), past polls