Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Regex priorities.

by Steve_BZ (Chaplain)
on May 28, 2012 at 00:54 UTC ( #972763=perlquestion: print w/replies, xml ) Need Help??
Steve_BZ has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys,

I have a regex designed to pick out some XML tags.

A typical string might look like this:


I want to process the smallest and inner-most pair of tags first (ie, in this case <EXHIBITING></EXHIBITING>).

I am using the regex:

my $r = '\<(\D+)\>(.*?)(\<\/\1\>)'; while ($loc_diagnoses_text =~ m/$r/gi){ ... processing stuff .... }

But it is processing the <COMMA><COMMA> pair first. How do I fix this?



Replies are listed 'Best First'.
Re: Regex priorities.
by BrowserUk (Pope) on May 28, 2012 at 01:20 UTC

    Does this match your expectations?

    $s = '<COMMA>pre-stuff<EXHIBITING>some stuff</EXHIBITING>post-stuff</C +OMMA>';; print "$1 :: $2" while $s =~ s[<(\D+)>([^<]*?)</\1>][]gi;; EXHIBITING :: some stuff COMMA :: pre-stuffpost-stuff

    Of course, it fails horribly if your non-tag content contains '<':

    $s = '<COMMA>pre-stuff<EXHIBITING>some <= stuff</EXHIBITING>post-stuff +</COMMA>';; print "$1 :: $2" while $s =~ s[<(\D+)>([^<]*?)</\1>][]gi;; {zilch here}

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Hi BrowserUk,

      Thanks for that. It worked perfectly. I decided to change "<" for "/" because I use it less, in fact I do use "<" for other reasons, but not "/", so I ended up with:

      my $r = q( <(\D+)> # Opening tag <....> ([^/]*?) # stuff in the middle which does not h +ave the closing tab character '/' (<\/\1>) # closing tag of same type as opening +tag </....>. ); while ($text =~ m/$r/gix){ ... processing ... }

      Thanks for your help.



Re: Regex priorities.
by Anonymous Monk on May 28, 2012 at 01:28 UTC

      Hi Anon,

      Thanks for this. I did in fact read most of the links you so kindly posted.

      I also thought it was a bit like a compiler problem and parsing was a potential solution, but I thought it would take longer. I was quite interested in how you would have parsed it.



Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://972763]
Approved by ww
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2018-02-23 21:03 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (310 votes). Check out past polls.