Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re: Regex: Matching around a word(s)

by eric256 (Parson)
on Dec 20, 2005 at 00:06 UTC ( #517936=note: print w/replies, xml ) Need Help??

in reply to Regex: Matching around a word(s)

Here is a slightly different approach. At least as far as I can tell this is unique. This builds a hash of matches and then rescans the source printing the matches. This automaticaly condenses down all the overlaps.

#!/usr/bin/perl use strict; use warnings; die "No search terms supplied!" unless @ARGV; my @words = @ARGV; my $regex = join("|",@words) ; my $expr = qr /^($regex)$/; $/ = ' '; my $i = 0; my $words = {}; my $pos = tell(DATA); for my $word (<DATA>) { chomp $word; $i++; if ($word =~ /$expr/) { for my $j (-5 .. 5) { $words->{$i + $j}++; } }; } seek(DATA, $pos, 0); $i =0; for my $word (<DATA>) { $i++; chomp $word; $word = "<$word>" if ($word =~ /$expr/); print "$word " if exists $words->{$i}; } __DATA__ Regular expressions have always been a weak spot for me, and I've got +a question that's got me stumped. Here's the problem I'm trying to solv +e. I have somewhat large articles of text (returned from a search), what + I'd like to do is capture the word and X number of words before and after + it while tagging the matching word in the captured text. My inital thoug +ht was to try something like this. The problem I have is that if there i +s more than one term and they overlap, the nth term will not be annotat +ed. So my next thought is lookahead/lookbehind, but they don't capture. Is there a way to do this with a single regex? Is a regex even the be +st way to do this? Thanks, -Lee

Eric Hodges $_='y==QAe=e?y==QG@>@?iy==QVq?f?=a@iG?=QQ=Q?9'; s/(.)/ord($1)-50/eigs;tr/6123457/- \/|\\\_\n/;print;

Replies are listed 'Best First'.
Re^2: Regex: Matching around a word(s)
by shotgunefx (Parson) on Dec 20, 2005 at 19:19 UTC
    Thanks. I'll whip up a benchmark with some of the different approaches, (actually have, but don't have yours in yet. Though I'll probably switch the filehandle use.

    Also, I'd probably use a hash instead of a hashref for $words as it is slightly faster.

    Not sure if it's faster or not (though "feels" it), I'd probably rewrite the following to use a hash slice
    # before if ($word =~ /$expr/) { for my $j (-5 .. 5) { $words->{$i + $j}++; } }; # after my $mwords = 5; @words{$i-$mwords..$i+$mwords} = 1 if $word =~ /$expr/; # Note: Keys created, but all but first have undef values


    perl digital dash (in progress)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://517936]
[moritz]: seems there's a size limit for hostenum, and the error message just sucks
[vrk]: None, other than the location of the error: auto/NetAddr/IP/
[vrk]: Aha, there's a reference to "bug report 82719" in NetAddr/'s sub _splitref.
[moritz]: wow, that's their idea of an "improved error message"
[moritz]: how about "This network is too large to enumerate all host addresses." or so?
[vrk]: Well, there's a call to notcontiguous() before the error, whose description is "counts the bit positions remaining in the mask when the rightmost '0's are removed"
[vrk]: Clear as mud!

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (9)
As of 2017-04-25 07:41 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (449 votes). Check out past polls.