Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Best way to escape code blocks with Text::Balanced?

by tshabet (Beadle)
on Aug 16, 2001 at 20:11 UTC ( [id://105427]=perlquestion: print w/replies, xml ) Need Help??

tshabet has asked for the wisdom of the Perl Monks concerning the following question:

This question may well be more about programming in general than Perl specifically, but since it's a problem that is presenting itself to me in Perl (and since the Monks have been so helpful in the past) I'm going to ask it here.
I'm writing a script that converts a language spec into XML. The spec is written in Curl which has this type of layout:
{Curl is all about {braces}}
Since I'm turning this stuff into XML, I'm using the fantastic Text::Balanced module to balance the braces so that the above can be turned into
<Curl> is all about <braces/></Curl>
OK, so far so good, right? Now I'm having no problems achieveing this conversion. The problem comes when I have something like
{Curl is all about {code {braces}}}
which should become
<Curl> is all about {braces}</Curl>
So, as you see, the code tag works in much the same way as HTML. Still cool, right? It gets a little more complicated. The code
{Curl is all about {code {braces}{escape {braces}}}}
should become
<Curl> is all about {braces}<braces/></Curl>
OK, that's as complicated as it gets, aside from the fact that there are a few other tags that act the same as "code," but that's no biggie. Anyway, here's the code I implemented to hopefully make this work:
while($next = (extract_bracketed($text, '{}', '[^{}]*' ))[0]) #this is + general. { $holder = $next; if($bext = (extract_bracketed($next, '{}', '(?s).*?(?=\{ctext|\{co +de|\{example|\{pre)' ))[0]) #this handles "code" and the like. { $bolder = $bext; while($cext = (extract_bracketed($bext, '{}', '(?s).*?(?=\{escape) +' ))[0]) #this is for escaped "code" and the like. { $colder = $cext; $cext =~ s/\{([^ \s|\}]*?)\}/<$1\/>/gix; $cext =~ s/\{([\w|-]*)(.*)\}/<$1>$2<\/$1>/osi; $bext =~ s/$colder/$cext/sgi; } $bext =~ s/\{pre(.*)\}/\<pre\>$1<\/pre>/gosix; $bext =~ s/\{ctext(.*)\}/\<ctext\>$1<\/ctext>/gosix; $bext =~ s/\{code(.*)\}/\<code\>$1<\/code>/gosix; $bext =~ s/\{example(.*)\}/\<example\>$1<\/example>/gosix; $bext =~ s/\}/ebrac/g; $bext =~ s/\{/obrac/g; $next =~ s/$bolder/$bext/sgi; } $next =~ s/\{([^ \s|\}]*?)\}/<$1\/>/gix; $next =~ s/\{([\w|-]*)(.*)\}/<$1>$2<\/$1>/osi; $text =~ s/$holder/$next/sgi; }
So this code (in my mind anyway) slurps up entire blocks of balanced code and then looks within for a "code"ish tag, looks within that for an "escape" tag, makes the conversions, then sort of backs out of the sub-block and does the necessary replacements. Hopefully my code is easier to follow than that last sentence :-)
So here's the thing: This code works 100% on balanced blocks, 100% (I think) on "code" blocks, and about 50% on escape blocks. I've been poking at it for a day trying to make it work, but to no avail. What am I doing wrong? A regex that's mad at me? Is the algorithm flawed? As you see, this is a question that's applicable to programming in general, or at least I suspect it is as long as its not just a regex or something. So anyway, this question is sort of a shot in the dark, but if anyone of the more experienced programmers (that would be all of you) see something worrisome in my code, I'd appreciate a helping hand. Thanks for your time and consideration :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://105427]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-25 17:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found