Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

regexp matching around a word

by kidd (Curate)
on May 14, 2005 at 02:19 UTC ( #456959=perlquestion: print w/ replies, xml ) Need Help??
kidd has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, im in seek of your wisdom once again.

I have this string:

my $text = "This is a test for the word blue more things blue and then + some very large text with a lot of things blue and done";
In this string there are several instances of the word blue. What im trying to do is fetch the text around the words "blue". So i have this regexp:
my $WANT = "blue"; my @results = $text =~ m/(?:\w+\s+){0,5} (?:$WANT) (?:\s+\w+){0,5} /xgi;
Basically what it does it finds from 0 to 5 words before the word "blue" and the same after the word "blue". This gives me the next results:
a test for the word blue more things blue and then with a lot of things blue and done
But then I realized that in the first result line a word "blue" appears on the 5 words after the word "blue".

So I thought that maybe I could change the regexp to find the 5 words after the word "blue" in case the word "blue" is in the words around "blue".

Maybe I could give an example, my desired result would be:

a test for the word blue more things blue and then some very large with a lot of things blue and done
What I tried without sucess was this:
my $WANT = "blue"; my @results = $text =~ m/(?:\w+\s+){0,5} (?:$WANT) (?:\s+\w+){0,5} (?: \s+(?:$WANT) (?:\s+\w+){0,5} )? /xgi;
I hope you can understand my problem.

Thanks in advanced for any help.

If your a spanish spoken programmer go to my site: Perl en Espaņol

Comment on regexp matching around a word
Select or Download Code
Replies are listed 'Best First'.
Re: regexp matching around a word
by dragonchild (Archbishop) on May 14, 2005 at 03:40 UTC
    Don't use a regex; use split().
    my @words = split /\s+$WANT\s+/, $text;

    Now, you have the text on either side of "blue". Use my @other_words = split ' ', $foo; to get at the other words.

    A regex is overkill and very unmaintainable. Keep it simple!


    • In general, if you think something isn't in Perl, try it out, because it usually is. :-)
    • "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"
Re: regexp matching around a word
by Zaxo (Archbishop) on May 14, 2005 at 02:57 UTC

    Most of your regex-fu is fine. The last, unsuccessful, one fails because it still matches any five words before looking for $WANT again. You could try a conditional second match to find the target string in the tail of the first match.

    Another approach would be to split the text and grep for indexes of $WANT. Then produce slices of eleven elements around the positions you find and join with space.

    my $text = "This is a test for the word blue more things blue and then + some very large text with a lot of things blue and done"; my @words = do { local $_ = $text; split; }; my $WANT = qr/\bblue\b/i; my @wants = grep { $words[$_] =~ $WANT } 0 .. $#words; for (@wants) { my $first = ($_ - 5 < 0) ? 0 : $_ - 5; my $last = ($_ + 5 > $#words) ? $#words : $_ + 5; print join( ' ', @words[$first .. $last]), $/; } __END__ a test for the word blue more things blue and then the word blue more things blue and then some very large with a lot of things blue and done

    After Compline,
    Zaxo

Re: regexp matching around a word
by saintmike (Vicar) on May 14, 2005 at 03:02 UTC
    While you might find a solution by just sticking to regexes for the problem shown, it's typically less painful to handle the parse states in perl code. Not the world's sexiest code, but works:
    my $text = "This is a test for the word blue more things blue and then + some very large text with a lot of things blue and done"; my $words_to_go = 0; my @keeper = (); while($text =~ /(\w+)/g) { if($1 eq "blue") { if(! $words_to_go) { print "$_ " for @keeper; } print "$1 "; $words_to_go = 5; } elsif($words_to_go) { $words_to_go--; print "$1 "; } else { push @keeper, $1; shift @keeper if @keeper > 5; } }
Re: regexp matching around a word
by sh1tn (Priest) on May 14, 2005 at 09:53 UTC
    push @results, [$1,$2] while $text =~ /((?:\w+\s+){0,5})blue((?:\s+\w+){0,5})/g;


Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://456959]
Approved by moot
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (8)
As of 2015-07-30 23:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (273 votes), past polls