http://www.perlmonks.org?node_id=655499

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hi, I have a string which looks like this $value = "Total age recorded (TOTAGE)"; How do I edit the $value to keep only between the brackets data. $value = "TOTAGE"; ?

Replies are listed 'Best First'.
Re: data between brackets
by philcrow (Priest) on Dec 06, 2007 at 20:54 UTC
      I tried to do this to save the data between the brackets but it doesnt work. $value =~ /^.*\(.*)\)/; Any idea what I am doing wrong with the syntax ?
        That looks OK to me (Update: you have an extra closing paren after the second star or else missing the opening one), but I guess you are asking how you can capture the part in parentheses. For that, as sh1tn shows below, you need to use unescaped, plain parentheses around the part you want to capture. Then, the captured stuff will be accessible to you in $1, assuming that there was a match.

        In the OP, you say "How do I edit the $value to keep only between the brackets data". If you really mean "edit", then you should use the substitution operator, as sh1tn shows below. You can check perlop for information on it (search within that page for "replacement").

        Since this is a problem that many developers often face, there are canned, well-tested regexes that solve it, which also handle nested brackets as a bonus. See Re: regex to parse (nested) parenthesis delimited string? for an example.

Re: data between brackets
by sh1tn (Priest) on Dec 06, 2007 at 21:07 UTC
    You should escape the special characters:
    s/.+?\((.+?)\).*/$1/;


Re: data between brackets
by ww (Archbishop) on Dec 06, 2007 at 22:25 UTC

    Your regex, $value =~ /^.*\(.*)\)/; fails because you have an an unmatched closing parenthesis in the regex. Attempting to execute a short version of what may be your code,

    my $value1 = "Total age recorded (TOTAGE)"; $value1 =~ /^.*\(.*)\)/; print $value1 . "\n";

    produces this:

    Unmatched ) in regex; marked by <-- HERE in m/^.*\(.*) <-- HERE \)/ at ....

    In other words, the regex engine balks when it finds a special character, the closing paren, unescaped, when it did not find an opening paren.

    When you receive a message of that sort, it's valuable to those who would assist you, so it is well to include it in your post.

    However, dealing only with the unmatched paren doesn't give you what you sought.

    So, extending/explaining sh1tn's correct answer of

    s/.+?\((.+?)\).*/$1/;
    • The .+? matches "anything, one or more times, UNTIL..." the regex engine finds a literal open parenthesis in $value.
    • The inner parens in sh1tn's formulation capture their content, in this case, "anything inside an opening paren, one or more times, until the regex engine finds closing paren, thus capturing (preserving) only "TOTAGE" in $1.
    • Since he recommended a substitution, rather than a simple match, the portion of his code following the middle "/" effectively discarding the original content of $value and replaces it (substitutes) what it captured in $1.

    You'll find more in the tutorial section, under Tutorials#Pattern-Matching-and-Regular-Expressions and while you may find Friedl's "Mastering Regular Expressions" more than you seek right now, but it's well worth the study.

Re: data between brackets
by sundialsvc4 (Abbot) on Dec 06, 2007 at 21:59 UTC

    I think that's on the right track...

    Also, take care to make your regular-expressions as specific as possible, so that you don't run into “false positives.” Don't simply test your logic on good data.

    For much the same reason, I suggest that you probably want to extract the desired information from the input-record without replacing the input-record. So, for example, instead of using the “substitute” syntax previously illustrated, I would probably use “extract” syntax, then grab the value from $1 so that now I had both the original value and the extracted data.

    It is a good idea to be as specific about the character-types that you expect, e.g. within those parentheses. Sure, “*” will accept anything, but if you accept “anything,” how will you ever discover “the glitch that has crept into a crucial 2.6% of the 1.3 million records in your input-data?” (Just kidding, but... programs that parse input-data have “the last clear chance” to discover the hidden data-flaws now that you do not want the auditors to be looking-for six months from now, if you catch my drift.)