Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Regex a little less greedy please

by martymart (Deacon)
on Mar 18, 2003 at 14:12 UTC ( [id://243977]=perlquestion: print w/replies, xml ) Need Help??

martymart has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks, I have a little app that uses a regular expression, the regex uses:
The information I need from this is the result of the postmatch of this search. Trouble is the greedy quantifier. The expression is searching on a string like:
<APPEND changed_date="02-02-2003">This is sample text</APPEND>
I would like the postmatch to give me back::
This is sample text</APPEND>
Instead, I think its matching to the '>' at the end of the string. What I need is to be able to tell the regex that the first time it encounters a '>' that it has achieved its match, is this possible? I would appreciate any ideas you may have on this.

Replies are listed 'Best First'.
Re: Regex a little less greedy please
by broquaint (Abbot) on Mar 18, 2003 at 14:19 UTC
    De-greedify that dot-star like so
    my $str = q[<APPEND changed_date="02-02-2003">This is sample text</APPEND>]; print "match: ", $str =~ m{ <APPEND (.*?) > }x, $/; print "post: ", $', $/; __output__ match: changed_date="02-02-2003" post: This is sample text</APPEND>
    Check out perlre for more info on perl's regex engine.


Re: Regex a little less greedy please
by arturo (Vicar) on Mar 18, 2003 at 14:29 UTC

    From perlre :

    If you want it to match the minimum 
    number of times possible, follow the 
    quantifier with a "?".  Note that the 
    meanings don’t change, just the 

    So, changing your regex to

    Should get you the behavior you want. If, however, you take to heart the lessons of Death to Dot Star!, you might want to write that this way:
    Avoiding using the post-match variable and using () to capture the stuff you want to get is left as an exercise for the reader =)


    If not P, what? Q maybe?
    "Sidney Morgenbesser"

Re: Regex a little less greedy please
by MZSanford (Curate) on Mar 18, 2003 at 14:16 UTC
    Parsing HTML/XML is somewhat tricky to do correctly (what with entities and all ... see Super Search for more info), but if you know that there will not be any >'s in the tag, you may want to use a regexp like ...

    from the frivolous to the serious
Re: Regex a little less greedy please
by roundboy (Sexton) on Mar 18, 2003 at 19:01 UTC
    In addition to using either the non-greedy quantifier (.*?) or skipping up to the next > ([^>]*), you also want to capture the text up through the matching end-tag, for which you just need a non-greedy quantifier inside capturing parens. So your regex should look like

    This puts the text between the tags into $1; if you really want the ending tag, too, just move the paren. I added the /b to make sure you only match <APPEND> tags, and not, e.g., <APPENDIX>. The only caveats on this are:

    1. You might want to add a /i modifier to the match, in case someone adds the tags in lower case.
    2. If there's ever a chance of a '>' appearing in the attributes of the tag, you need something more complicated. The following (untested, but based on Friedl's Mastering Regular Expressions) should work:


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://243977]
Approved by Corion
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2024-09-18 11:14 GMT
Find Nodes?
    Voting Booth?
    The PerlMonks site front end has:

    Results (24 votes). Check out past polls.

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.