Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Regex: Matching around a word(s)

by eric256 (Parson)
on Dec 20, 2005 at 00:06 UTC ( #517936=note: print w/ replies, xml ) Need Help??


in reply to Regex: Matching around a word(s)

Here is a slightly different approach. At least as far as I can tell this is unique. This builds a hash of matches and then rescans the source printing the matches. This automaticaly condenses down all the overlaps.

#!/usr/bin/perl use strict; use warnings; die "No search terms supplied!" unless @ARGV; my @words = @ARGV; my $regex = join("|",@words) ; my $expr = qr /^($regex)$/; $/ = ' '; my $i = 0; my $words = {}; my $pos = tell(DATA); for my $word (<DATA>) { chomp $word; $i++; if ($word =~ /$expr/) { for my $j (-5 .. 5) { $words->{$i + $j}++; } }; } seek(DATA, $pos, 0); $i =0; for my $word (<DATA>) { $i++; chomp $word; $word = "<$word>" if ($word =~ /$expr/); print "$word " if exists $words->{$i}; } __DATA__ Regular expressions have always been a weak spot for me, and I've got +a question that's got me stumped. Here's the problem I'm trying to solv +e. I have somewhat large articles of text (returned from a search), what + I'd like to do is capture the word and X number of words before and after + it while tagging the matching word in the captured text. My inital thoug +ht was to try something like this. The problem I have is that if there i +s more than one term and they overlap, the nth term will not be annotat +ed. So my next thought is lookahead/lookbehind, but they don't capture. Is there a way to do this with a single regex? Is a regex even the be +st way to do this? Thanks, -Lee

___________
Eric Hodges $_='y==QAe=e?y==QG@>@?iy==QVq?f?=a@iG?=QQ=Q?9'; s/(.)/ord($1)-50/eigs;tr/6123457/- \/|\\\_\n/;print;


Comment on Re: Regex: Matching around a word(s)
Download Code
Replies are listed 'Best First'.
Re^2: Regex: Matching around a word(s)
by shotgunefx (Parson) on Dec 20, 2005 at 19:19 UTC
    Thanks. I'll whip up a benchmark with some of the different approaches, (actually have, but don't have yours in yet. Though I'll probably switch the filehandle use.

    Also, I'd probably use a hash instead of a hashref for $words as it is slightly faster.

    Not sure if it's faster or not (though "feels" it), I'd probably rewrite the following to use a hash slice
    # before if ($word =~ /$expr/) { for my $j (-5 .. 5) { $words->{$i + $j}++; } }; # after my $mwords = 5; @words{$i-$mwords..$i+$mwords} = 1 if $word =~ /$expr/; # Note: Keys created, but all but first have undef values


    -Lee

    perl digital dash (in progress)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://517936]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (17)
As of 2015-07-30 16:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (273 votes), past polls