Re^2: Where to put self-made loop logic (separate module)?

Wow, so many answers already. :-)

The best way to find out if this should be a separate module is probably to let you guys decide, so here's the code...
This code parses some special list markup (similar to Wiki markup).
The scalarref ($text) is directly modified (I don't want to copy it and go $$text = $newtext).
If I wanted to write another parser function which does even more complicated modifications (which cannot be replaced by a regex), I'd have some duplicate code if this loop was not a separate module.
I know this isn't the prettiest piece of code mankind has written, but it seems to work (even with 3 byte UTF-8 characters / note that the line length changes).
I'd be happy if you could help me improve my code (even if there's a one-line alternative).

sub parseLists
{
    my $self = shift;
    my $text = \$self->{_text};

    # So we want to loop through the text line by line
    # and be able to modify some lines,
    # but we don't want to rebuild/copy the whole text.
    my $lf = "\n"; # linebreak
    my $lflen = length $lf; # 1
    my $pos1 = 0; # left line offset
    my $pos2 = 0; # right line offset (1st char after line)
    my $len = 0; # line length
    my $lendif = 0; # line length difference
    my $inlist;
    open my $fh, "<:utf8", $text; # Note how we open in UTF-8 mode
    # while (<$fh>) # Gets confused when line length changes
    # Using seek() is risky, because it reads bytes, not chars!
    # However, substr() always counts chars, not bytes.
    while (<$fh>)
    {
        # Get line string without newline character
        my $line = substr $_, 0, -$lflen;
        my $oldline = $line;

        # Calculate offsets
        $len = length $line;
        $pos1 = $pos2;
        $pos2 += $len + $lflen;

        # Modify line
        # START (not part of loop structure)
        my $isasterisk = $line =~ m/^\* /;
        my $isindented = $line =~ m/^\  /;
        my $isfirst;
        if (!$inlist)
        {
            if ($isasterisk)
            {
                $isfirst = 1;
                $inlist = 1;
            }
        }
        if ($inlist)
        {
            if (!$isindented && !$isasterisk)
            {
                substr $line, 0, 0, "</ul>\n";
                $inlist = 0;
            }
            elsif ($isindented)
            {
                $line =~ s/^\  (.*)/<li class="nobullet">$1<\/li>/;
            }
            elsif ($isasterisk)
            {
                $line =~ s/^\* (.*)/<li>$1<\/li>/;
                substr $line, 0, 0, "<ul>\n" if $isfirst;
            }
        }
        # END

        # Write new line back
        substr $$text, $pos1, $len, $line;

        # Calculate diff
        $lendif = (length $line) - ((length $_) - ($lflen));

        # Adjust our and Perl's (!) position counter
        $pos2 += $lendif; # That's our counter
        seek $fh, $lendif, 1; # That's from Perl / SEEK_CUR
    }
}
[download]

Comment on Re^2: Where to put self-made loop logic (separate module)? Download Code

Replies are listed 'Best First'.
Re^3: Where to put self-made loop logic (separate module)? by davido (Cardinal) on Dec 30, 2012 at 17:56 UTC
Does the format of the input data have a name? Is the input data format documented somewhere? Dave	[reply]
Re^4: Where to put self-made loop logic (separate module)? by basic6 (Novice) on Dec 30, 2012 at 20:50 UTC
The input is (part of) a template. In other words, it's just human-readable text, no code (no Perl, no HTML). There may be a couple of lines, each starting with an asterisk - that's what my function parses. But that's probably not important, as my goal is to have a line-by-line loop structure which allows in-place modifications. For that matter, the input is guaranteed to always be UTF-8 encoded text (only printable characters and whitespaces).	[reply]
Re^5: Where to put self-made loop logic (separate module)? by davido (Cardinal) on Jan 01, 2013 at 00:29 UTC
With the parsing logic embedded, it seems like a very specific tool that targets an uncommon input format. That's not "generally useful", and probably doesn't make for a great CPAN candidate. If you write the loop such that one could pass in a subref, it might be a more generalized tool, but then it's just a loop, and I don't see much advantage to it. This isn't to say you're not onto something. I just think it would require that you decide what exactly you want your module to accomplish. Throughout this thread I just see a module that handles a specific type of data that nobody else is likely to need to handle. Dave	[reply]


We don't bite newbies here... much
	PerlMonks