Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

I don't remember regex seeming this hard before

by danderson (Beadle)
on Jun 15, 2004 at 00:08 UTC ( #366730=perlquestion: print w/replies, xml ) Need Help??

danderson has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I am thoroughly stuck.

Suppose one was to have a string formatted like so: "<a a> a <a>" and one wanted to translate every character inside <>s to upper case (it's a much simplified version of what I'm actually doing - I figure, no point in cluttering up the quesion with [^\(\)-]s etc. Oops, too late.)

How do you go about this? Non-greedy matching won't work, because <>s can nest. That is, "<<a> a> a <a>" should have every char except the third 'a' modified. Non-greedy would do the first and last. Greedy will do all.

So I've been thinking that the solution must be to iterate or recurse over the string char-by-char, but I'm loathe to sinking to C-array style string parsing. Heck, I'm not even sure how to handle strings char-by-char in Perl.

Is that the only solution? Or is there a regex trick that I don't know of that makes this simple?

Replies are listed 'Best First'.
Re: I don't remember regex seeming this hard before
by Sidhekin (Priest) on Jun 15, 2004 at 00:20 UTC

    You want a parser, not a regex. But I am in a weird mood ...

    my $test = "<<a> a> a <a>"; for ($test) { my $count = 0; s{(.)}{ $count++ if $1 eq '<'; $count-- if $1 eq '>'; $count ? uc $1 : $1; }ge; } print $test;

    print "Just another Perl ${\(trickster and hacker)},"
    The Sidhekin proves Sidhe did it!

      Yes, you're correct, it's closer to a parser (the not-an-example version is going to do a bit of regex on the contents of the <>s).

      That's a pretty spiffy, though - thanks!
Re: I don't remember regex seeming this hard before
by runrig (Abbot) on Jun 15, 2004 at 00:33 UTC
    Warning: I don't know if this is a correct solution. But it seems like it works, so I offer it up for criticism, just because it seems nifty:
    use strict; use warnings; use Regexp::Common; my $str = "a<<a> a>b<a>c<a>a"; while ($str=~/\G(?:|$RE{balanced}{-parens=>'<>'}) ([^<]+) (?:|$RE{balanced}{-parens=>'<>'})/xg) { # Update: Oops, this uppercases everything except what # the OP wanted :) substr($str, $-[1],$+[1]-$-[1]) = uc $1; pos($str) = $+[1]; }
    Update: I should read the OP again. This seems to match, but doesn't change the case like the OP wants. Nevermind (for now) :(

    Update: Fixed. Though I'm still not sure if it's absolutely correct :)

    Update (simplified solution below):

    $str =~ s/(^|\G$RE{balanced}{-parens=>'<>'})([^<]+)/\U$1\E$2/g; # Again? (and now I feel a DUH! coming on (: $str =~ s/($RE{balanced}{-parens=>'<>'})/\U$1/g;
Re: I don't remember regex seeming this hard before
by bsb (Priest) on Jun 15, 2004 at 02:11 UTC
    To do the nesting right with a regex you'd need to use (??{}) which is probably not the simplest answer for this problem. Use one of the other solutions
Re: I don't remember regex seeming this hard before
by borisz (Canon) on Jun 15, 2004 at 11:59 UTC
    Use a regex that calls itself.
    #!/usr/bin/perl my $qr; $qr = qr!(?:\<(?:(?>[^\<\>]+)|(??{$qr}))*\>)!; $_ = "<<a> a> a <a>"; s/($qr)/\U$1/g; print;
    Boris
Re: I don't remember regex seeming this hard before
by ambrus (Abbot) on Jun 15, 2004 at 09:32 UTC

    Here's a simple sed-like solution. This is probably not the fastest way, though.

    $string = "<<a> a> a <a>\n"; { $string=~s/<([^<]*)>/\U$1/g and redo; } print $string;
Re: I don't remember regex seeming this hard before
by ambrus (Abbot) on Jun 15, 2004 at 12:34 UTC

    An uglier solution, just to be complete:

    $k = 0; $string=~s@(?:<(?{++$k})|>(?{--$k}))*([^<>]*)@$k?uc($1):$1@ge; +
      Or, to be (arguably) less ugly and do a little less work,
      s/([<>]+[^<>]+)/($k+=($1=~y#<##)-($1=~y#>##)) ? uc $1 : $1/ge;

      We're not really tightening our belts, it just feels that way because we're getting fatter.
Re: I don't remember regex seeming this hard before
by Roy Johnson (Monsignor) on Jun 15, 2004 at 11:42 UTC
    This article may provide some additional enlightenment.

    We're not really tightening our belts, it just feels that way because we're getting fatter.
Re: I don't remember regex seeming this hard before
by orderthruchaos (Scribe) on Jun 15, 2004 at 13:40 UTC
    If you have the means, you may wish to check out Recipe 6.17 from The Perl Cookbook, 2nd Ed. I'm not typing out the code here in case it would be a copyright infringement.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://366730]
Approved by Sidhekin
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2019-10-16 21:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?