Wow, so many answers already. :-)
The best way to find out if this should be a separate module is probably to let you guys decide, so here's the code...
This code parses some special list markup (similar to Wiki markup).
The scalarref ($text) is directly modified (I don't want to copy it and go $$text = $newtext).
If I wanted to write another parser function which does even more complicated modifications (which cannot be replaced by a regex), I'd have some duplicate code if this loop was not a separate module.
I know this isn't the prettiest piece of code mankind has written, but it seems to work (even with 3 byte UTF-8 characters / note that the line length changes).
I'd be happy if you could help me improve my code (even if there's a one-line alternative).
sub parseLists
{
my $self = shift;
my $text = \$self->{_text};
# So we want to loop through the text line by line
# and be able to modify some lines,
# but we don't want to rebuild/copy the whole text.
my $lf = "\n"; # linebreak
my $lflen = length $lf; # 1
my $pos1 = 0; # left line offset
my $pos2 = 0; # right line offset (1st char after line)
my $len = 0; # line length
my $lendif = 0; # line length difference
my $inlist;
open my $fh, "<:utf8", $text; # Note how we open in UTF-8 mode
# while (<$fh>) # Gets confused when line length changes
# Using seek() is risky, because it reads bytes, not chars!
# However, substr() always counts chars, not bytes.
while (<$fh>)
{
# Get line string without newline character
my $line = substr $_, 0, -$lflen;
my $oldline = $line;
# Calculate offsets
$len = length $line;
$pos1 = $pos2;
$pos2 += $len + $lflen;
# Modify line
# START (not part of loop structure)
my $isasterisk = $line =~ m/^\* /;
my $isindented = $line =~ m/^\ /;
my $isfirst;
if (!$inlist)
{
if ($isasterisk)
{
$isfirst = 1;
$inlist = 1;
}
}
if ($inlist)
{
if (!$isindented && !$isasterisk)
{
substr $line, 0, 0, "</ul>\n";
$inlist = 0;
}
elsif ($isindented)
{
$line =~ s/^\ (.*)/<li class="nobullet">$1<\/li>/;
}
elsif ($isasterisk)
{
$line =~ s/^\* (.*)/<li>$1<\/li>/;
substr $line, 0, 0, "<ul>\n" if $isfirst;
}
}
# END
# Write new line back
substr $$text, $pos1, $len, $line;
# Calculate diff
$lendif = (length $line) - ((length $_) - ($lflen));
# Adjust our and Perl's (!) position counter
$pos2 += $lendif; # That's our counter
seek $fh, $lendif, 1; # That's from Perl / SEEK_CUR
}
}
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.