Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Regex priorities.

by Steve_BZ (Hermit)
on May 28, 2012 at 00:54 UTC ( #972763=perlquestion: print w/ replies, xml ) Need Help??
Steve_BZ has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys,

I have a regex designed to pick out some XML tags.

A typical string might look like this:

<COMMA><EXHIBITING></EXHIBITING></COMMA>

I want to process the smallest and inner-most pair of tags first (ie, in this case <EXHIBITING></EXHIBITING>).

I am using the regex:

my $r = '\<(\D+)\>(.*?)(\<\/\1\>)'; while ($loc_diagnoses_text =~ m/$r/gi){ ... processing stuff .... }

But it is processing the <COMMA><COMMA> pair first. How do I fix this?

Regards

Steve

Comment on Regex priorities.
Select or Download Code
Re: Regex priorities.
by BrowserUk (Pope) on May 28, 2012 at 01:20 UTC

    Does this match your expectations?

    $s = '<COMMA>pre-stuff<EXHIBITING>some stuff</EXHIBITING>post-stuff</C +OMMA>';; print "$1 :: $2" while $s =~ s[<(\D+)>([^<]*?)</\1>][]gi;; EXHIBITING :: some stuff COMMA :: pre-stuffpost-stuff

    Of course, it fails horribly if your non-tag content contains '<':

    $s = '<COMMA>pre-stuff<EXHIBITING>some <= stuff</EXHIBITING>post-stuff +</COMMA>';; print "$1 :: $2" while $s =~ s[<(\D+)>([^<]*?)</\1>][]gi;; {zilch here}

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Hi BrowserUk,

      Thanks for that. It worked perfectly. I decided to change "<" for "/" because I use it less, in fact I do use "<" for other reasons, but not "/", so I ended up with:

      my $r = q( <(\D+)> # Opening tag <....> ([^/]*?) # stuff in the middle which does not h +ave the closing tab character '/' (<\/\1>) # closing tag of same type as opening +tag </....>. ); while ($text =~ m/$r/gix){ ... processing ... }

      Thanks for your help.

      Regards

      Steve

Re: Regex priorities.
by Anonymous Monk on May 28, 2012 at 01:28 UTC

      Hi Anon,

      Thanks for this. I did in fact read most of the links you so kindly posted.

      I also thought it was a bit like a compiler problem and parsing was a potential solution, but I thought it would take longer. I was quite interested in how you would have parsed it.

      Regards

      Steve

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://972763]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2014-08-02 07:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (55 votes), past polls