Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: regexp - repeatedly delete words expressed in alternation from end of string

by ikegami (Pope)
on Nov 06, 2007 at 16:33 UTC ( #649253=note: print w/replies, xml ) Need Help??


in reply to repeatedly delete words expressed in alternation from end of string [regexp]

First, wouldn't it be better to start with a list of words instead of a regex?

my @words = qw( SA NV LTD CO LLC );

So we'll need to build the regex programatically.

my ($re) = map qr/$_/i, join '|', map quotemeta, @words;

Using Regexp::List can greatly speed up the process.

use Regexp::List qw( ); my $re = Regexp::List->new(modifiers => 'i')->list2re(@words);

Now that we have the regex, let's avoid the fragility of 1 while s/// while properly removing spaces.

while (<>) { chomp; s/^ (?: $re \s+ )+//x; s/ (?: \s+ $re )+//xg; print("$_$/"); }

Note: You were using capturing parens ((...)) when you only needed non-capturing parens ((?:...)). Removing the need to capture greatly improves the speed of regexs.

Update: Oops, it could still leave spaces. Fixed.
Update: Added Regexp::List method.

Replies are listed 'Best First'.
Re^2: regexp - repeatedly delete words expressed in alternation from end of string
by Roy Johnson (Monsignor) on Nov 06, 2007 at 17:42 UTC
    Should only remove the expression from the end of the string. So it's actually a little simpler:
    my @words = qw( SA NV LTD CO LLC ); my ($re) = map qr/$_/i, join '|', map quotemeta, @words; while (<DATA>) { chomp; s/(?:\s*\b$re)+$//; print "[$_]\n"; } __END__ Bobs leave SA Warehouse SA LTD Jims Fine Wines CO LLC

    Caution: Contents may have been coded under pressure.
Re^2: regexp - repeatedly delete words expressed in alternation from end of string
by princepawn (Parson) on Nov 06, 2007 at 18:10 UTC
    Note: You were using capturing parens ((...)) when you only needed non-capturing parens ((?:...)). Removing the need to capture greatly improves the speed of regexs.
    Thanks for this. Thing is, I have an entire module full of this mistake. Unless there is a pragma to fix this, then I have to go fix them all manually.


    Ivan Raikov says: the first step to understanding recursion is to begin by understanding recursion.
      Note: You were using capturing parens ((...)) when you only needed non-capturing parens ((?:...)). Removing the need to capture greatly improves the speed of regexs.

      Well, it can. It has virtually no impact for many cases. For the cases where it causes the string being matched to be copied, then the "greatly" only applies if you are matching against a large string.

      Re^6: Can we make $& better? (need) shows that it used to be only a regex w/o /g in a scalar context that incurred this penalty. demerphq patched Perl such that newer Perls also have the penalty for a regex w/o /g in a list context. (So for modern Perls, /g is necessary and sufficient to prevent the copying, it seems.)

      - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://649253]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (12)
As of 2018-12-17 15:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How many stories does it take before you've heard them all?







    Results (73 votes). Check out past polls.

    Notices?
    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!