Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: regexp - repeatedly delete words expressed in alternation from end of string

by ikegami (Pope)
on Nov 06, 2007 at 16:33 UTC ( #649253=note: print w/replies, xml ) Need Help??


in reply to repeatedly delete words expressed in alternation from end of string [regexp]

First, wouldn't it be better to start with a list of words instead of a regex?

my @words = qw( SA NV LTD CO LLC );

So we'll need to build the regex programatically.

my ($re) = map qr/$_/i, join '|', map quotemeta, @words;

Using Regexp::List can greatly speed up the process.

use Regexp::List qw( ); my $re = Regexp::List->new(modifiers => 'i')->list2re(@words);

Now that we have the regex, let's avoid the fragility of 1 while s/// while properly removing spaces.

while (<>) { chomp; s/^ (?: $re \s+ )+//x; s/ (?: \s+ $re )+//xg; print("$_$/"); }

Note: You were using capturing parens ((...)) when you only needed non-capturing parens ((?:...)). Removing the need to capture greatly improves the speed of regexs.

Update: Oops, it could still leave spaces. Fixed.
Update: Added Regexp::List method.

Replies are listed 'Best First'.
Re^2: regexp - repeatedly delete words expressed in alternation from end of string
by Roy Johnson (Monsignor) on Nov 06, 2007 at 17:42 UTC
    Should only remove the expression from the end of the string. So it's actually a little simpler:
    my @words = qw( SA NV LTD CO LLC ); my ($re) = map qr/$_/i, join '|', map quotemeta, @words; while (<DATA>) { chomp; s/(?:\s*\b$re)+$//; print "[$_]\n"; } __END__ Bobs leave SA Warehouse SA LTD Jims Fine Wines CO LLC

    Caution: Contents may have been coded under pressure.
Re^2: regexp - repeatedly delete words expressed in alternation from end of string
by princepawn (Parson) on Nov 06, 2007 at 18:10 UTC
    Note: You were using capturing parens ((...)) when you only needed non-capturing parens ((?:...)). Removing the need to capture greatly improves the speed of regexs.
    Thanks for this. Thing is, I have an entire module full of this mistake. Unless there is a pragma to fix this, then I have to go fix them all manually.


    Ivan Raikov says: the first step to understanding recursion is to begin by understanding recursion.
      Note: You were using capturing parens ((...)) when you only needed non-capturing parens ((?:...)). Removing the need to capture greatly improves the speed of regexs.

      Well, it can. It has virtually no impact for many cases. For the cases where it causes the string being matched to be copied, then the "greatly" only applies if you are matching against a large string.

      Re^6: Can we make $& better? (need) shows that it used to be only a regex w/o /g in a scalar context that incurred this penalty. demerphq patched Perl such that newer Perls also have the penalty for a regex w/o /g in a list context. (So for modern Perls, /g is necessary and sufficient to prevent the copying, it seems.)

      - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://649253]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (2)
As of 2018-07-21 07:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (444 votes). Check out past polls.

    Notices?