How can I process "lazy" XML like our <code> tags? The best solution would work within the Twig framework, but here is a stand-alone preprocessor that does it.
This concept demo below will scan the proto-XML and escape out chars in the elements that are supposed to be literal.
I thought about using Parse::RecDecent, or other parsing technology, but it should be a simple problem. I'm wondering if this general idea, of using cascaded RE's with a continuing "pos", can be improved.
use strict;
use warnings;
sub is_literal ($$)
{
my ($name, $attrs)= @_;
return ($name eq 'listing') || ($name eq 'signature'); # simple demo
+.
# change this to analyse $name and $attrs to decide whether to treat
+this literally.
}
sub escape_out ($)
{
my $passage= shift;
$passage =~ s/&/&/g;
$passage =~ s/</</g;
return "[[[* $passage *]]]"; # [[[]]] to visibly show that the right
+ "bite" was taken.
}
sub scan ($)
{
my @passages;
my $line= shift;
# first pass: note what sections need treatment, without actually mod
+ifying the string.
# modifying the string would mess up the "pos" used by the RE's.
while ($line =~ m/<\s*(\w+)([^>]*)>/g) {
# for every start tag...
my $startpos= pos($line);
my $name= $1;
if (is_literal ($name, $2)) {
# if targeted, find the matching end tag using simple pattern (
+ignoring other stuff).
# this skips that passage for the continued search of all start
+ tags.
$line =~ m/<\/$name>/g;
my $endpos= pos($line);
unshift @passages, [$startpos, $endpos-(length($name)+3)];
}
}
# second pass: process the sections noted above, from right-to-left s
+o
# positions don't change.
foreach my $range (@passages) {
my ($start, $end)= @$range;
my $length= $end-$start;
substr($line, $start, $length)= escape_out (substr($line, $start,
+$length));
# is there an easier way to do that without substr'ing twice?
}
print $line;
}
my $testdata= <<'EOF';
<method name="mainloop">
<signature virtual="1">int mainloop (ratwin::message::MSG&)</sig
+nature>
<P>This is the canonocal logic of the message pump. It looks ap
+roximatly
like this:</P>
<listing>
use & and <things> in here.
MSG msg;
while ( GetMessage(msg) ) {
if (msg.hwnd == 0) thread_message (msg);
else {
if (!pre_translate (msg)) { // check IsDialog, Trans
+lateAccelerator
if (!translate_key_even(msg)) // Win32 TranslateMe
+ssage
DispatchMessage(msg);
}
}
}
return (msg.wParam);
</listing>
<P>Override this if you need to customize this beyond the point
+provided
for by the virtual functions provided for the individual steps.<
+/P>
</method>
EOF
scan ($testdata);
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.