Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Regex question

by ultranerds (Pilgrim)
on Sep 22, 2011 at 15:03 UTC ( #927387=perlquestion: print w/ replies, xml ) Need Help??
ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys,

First of all, lemme show you some example values I need to match:
[url]http://www.test.com[/url] [img]http://www.site.com/image.jpg[/img] [url http://www.test.com]name of link[/url] [citation]some quote here[/quote]
Now, I have this regex:

while ($_[0] =~ /\[(\/?)(lien|url|citation|img).+?\]/sg) { print "GOT: $1 and $2 \n"; }
..which is meant to capture the / (if it exists, after the [, and assign to $1), and then also capture to tag name itself to $2)

However, it doesn't seem to work (it only picks up the ones without the / in the tag)

If I change the regex to remove .+? , so we have:

while ($_[0] =~ /\[\/?(lien|url|citation|img)\]/sg) { print "BLA: $1 \n"; }
..it works, but obviously doesn't pick up stuff like:

[url http://www.test.com]some text[/url]

Anyone got any suggestions?

UPDATE: I think I've worked it out - I needed .*? instead of .+? , cos .*? can mean 0 or more results, whereas .+? means "at least 1")

TIA

Andy

Comment on Regex question
Select or Download Code
Re: Regex question
by Anonymous Monk on Sep 22, 2011 at 15:12 UTC

    Anyone got any suggestions?

    :) Use a module already? ;)

    This is what you have

    use YAPE::Regex::Explain; print YAPE::Regex::Explain ->new( qr/\[(\/?)(lien|url|citation|img).+?\]/s )->explain; __END__ The regular expression: (?s-imx:\[(/?)(lien|url|citation|img).+?\]) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?s-imx: group, but do not capture (with . matching \n) (case-sensitive) (with ^ and $ matching normally) (matching whitespace and # normally): ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- /? '/' (optional (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- lien 'lien' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- url 'url' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- citation 'citation' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- img 'img' ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- .+? any character (1 or more times (matching the least amount possible)) ---------------------------------------------------------------------- \] ']' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    What do you think is wrong with your regex?

      Yeah, this part ;)
      .+? any character (1 or more times (matching the least amount possible))
      As I updated in the above thread - I changed it to .*? and it works perfectly :)
Re: Regex question
by moritz (Cardinal) on Sep 22, 2011 at 15:18 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://927387]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2014-07-29 06:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (211 votes), past polls