Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: strip out anything inbetween brackets

by cog (Parson)
on Apr 05, 2005 at 14:07 UTC ( [id://444984]=note: print w/replies, xml ) Need Help??


in reply to strip out anything inbetween brackets

The naive approach would be

$string =~ s/\(.*\)//;

Which would do the trick in this particular case, but would convert "this is a (blah) and this is not a (blah)" in "this is a ", which is why you should use a non-eager quantifier:

$string =~ s/\(.*?\)//;

This does the trick...

Don't forget, however, to use the /g switch (for global substitutions). Also, your example has the result as being "this is a" (notice there's no space after the a...)

If that's what you want, you just need to include \s* on both ends of your regular expression...

OTOH, that would turn "this is a (blah) bleh" into "this is ableh", which is probably not what you want... O:-)

Replies are listed 'Best First'.
Re^2: strip out anything inbetween brackets
by reasonablekeith (Deacon) on Apr 05, 2005 at 14:51 UTC
    Well I was going to post to say you should really be checking using a negated character class, rather than having all that backtracking going on. I was pretty sure it'd be faster, and it's what I would normally do when coding regexes like this.

    I did a quick benchmark first, and it turns out I was wrong, the negated character class get relatively more and more inefficient the longer the data it has to scoop up is. Twice as much as proved here.

    use strict; use Benchmark qw(:all) ; my $count = 50000; my $replacement_string = "this is a (" . "a"x1000 . ") test"; cmpthese($count, { 'negated' => sub { my $text = $replacement_string; $text =~ s|\([^)]*\)||sg; }, 'backtrack' => sub { my $text = $replacement_string; $text =~ s|\(.*?\)||sg; }, }); OUTPUT Rate negated backtrack negated 8562/s -- -67% backtrack 26316/s 207% --
    I still think there's something to be said for the character class, as it is more explicit (after all, we are trying to match anything other than the closing bracket.), but it it certainly slower.

    This surprised me, so I thought I'd post it, incase it surprised anyone else.

Re^2: strip out anything inbetween brackets
by jhourcle (Prior) on Apr 05, 2005 at 14:48 UTC

    The second one will work provided that you don't have nested parens:

    This is a ((very important) blah)

    If there's a possibility of that sort of thing happening, you'll probably want to look at Pustular Postulant's recommendation, and not use a regex. (I don't know that exact module, so if it'll handle it, or if you need to look for something else) I've typically run into this problem with SGML, so used a parser specifically for HTML or XML... I don't know if there's something that does nested braces and the like.

Re^2: strip out anything inbetween brackets
by Anonymous Monk on Apr 05, 2005 at 14:19 UTC
    perfect thankyou for your help
      You are welcome, Anonymous Monk.

      You should also consider creating a user in this site :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://444984]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (10)
As of 2024-04-18 09:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found