-
HOP::Lexer didn't do the trick because what you need (and wrote) is a parser, not just a lexer.
-
Your code has bugs. It doesn't recognie the bold in <uc><b>...</b></uc>, and it silently ignores the </uc> in <b>...</uc></b>. While this is not too hard to fix, you'll notice adding a third tag to your model will be very hard (and introduce lots of redundancy and fragility), and introducing a fourth will be near-impossible.
(
2 tags => 4 hashes in %$lexer
3 tags => 16 hashes in %$lexer
4 tags => 65 hashes in %$lexer
5 tags => 326 hashes in %$lexer
)
Part of the problem is that it's not proper (i.e. maintainable, expandable) for each tag to be a seperate token. <b> and <uc> should consist of the same one (opentag) or three (opentagopen, tagname, tagclose) tokens.
-
I love using /.../xgc for lexing (aka tokenizing). See What good is \G in a regular expression? for more on this topic. My version below uses this. ( Oops! So does yours! )
My version:
use strict;
use warnings;
my %VALID_ELEMENTS = map { $_ => 1 } qw( b uc );
my %state;
my @to_close;
sub output_text {
my ($text) = @_;
$text =~ s/([[:print:]])/$1$1/g
if $state{b};
$text = uc($text)
if $state{uc};
print($text);
}
sub open_tag {
/\G ( \w+ ) /xgc
or die("Expecting tag name\n");
my $ele = lc($1);
$VALID_ELEMENTS{$ele}
or die("Unknown element name $ele\n");
/\G > /xgc
or die("Expecting closing bracket\n");
!$state{$ele}
or die("Attempting to open previously opened element\n");
$state{$ele} = 1;
push @to_close, $ele;
}
sub close_tag {
/\G (\w+) /xgc
or die("Expecting tag name\n");
my $ele = lc($1);
$VALID_ELEMENTS{$ele}
or die("Unknown element name $ele\n");
/\G > /xgc
or die("Expecting closing bracket\n");
$state{$ele}
or die("Attempting to close unopened element\n");
my $to_close = $to_close[$#to_close];
$to_close eq $ele
or die("Missing closing tag for element $to_close\n");
$state{$ele} = 0;
pop @to_close;
}
sub process {
# The following "for" aliases $_ to $text and
# provides a target for the upcoming "redo" stmts.
for ($_[0]) {
/\G <\/ /xsgc && do { close_tag(); redo };
/\G < /xsgc && do { open_tag(); redo };
/\G ( [^<]+ ) /xsgc && do { output_text("$1"); redo };
# Falls through at the end of the string.
}
}
process("This is a nifty <b><uc>uppercase</uc> test</b> to see what <u
+c>this</uc> thing can do.\n");
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|