Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Embedding pod in other languages

by jpl (Monk)
on May 19, 2011 at 15:02 UTC ( #905727=perlquestion: print w/replies, xml ) Need Help??
jpl has asked for the wisdom of the Perl Monks concerning the following question:

Pursuant to the thread started in Embedding pod in C with suggestions from John M. Dlugosz and JavaFan (including starting a new thread to get attract fresh eyes and request a serious review), enclosed is a first cut at a preprocessor. It will extract pod (see perlpod) from languages other than perl, where commenting conventions or stylistic preferences prevent starting everything in column 0. This version shows how it is easy to extend the preprocessing to languages other than C or even to perl itself, to slightly relax the column 0 restrictions. The example is not comprehensive or documented, awaiting comments.

Finding the right set of options will be important. I favor language-wide behavior to encourage "standards", but the model will allow fine tuning of the control over how to recognize the start and stop of the pod, and how to trim it to generate genuine pod. Verbatim lines need some sort of special identification (currently =v followed by whitespace) to allow processed lines to begin in some column other than 0. As requested, a newline is added between blocks of pod (as needed).

The code is mostly initialization, to give a sense of how to control things. Processing the input is quite straightforward, and could be even simpler, if we drop control over whether the start and stop sequences are, themselves, included in the output. Comments, please.

#!/usr/bin/perl -w use strict; use Getopt::Long; my %languages = ( 'c' => [ '^\s*#\s*ifdef\s+pod\b', 0, '^\s*#\s*endif\s*/\*\s*pod\s*\*/', 0, '^\s*', '^\s*=v\s', ], 'awk' => [ '^\s*#\s*=pod\b', 0, '^\s*#\s*=cut\b', 0, '^\s*#\s*', '^\s*#\s*=v\s', ], 'perl' => [ '^\s*=pod$', 0, '^\s*=cut$', 0, '^\s*', '^\s*=v\s', ], ); for my $l qw( C c++ C++ ) { $languages{$l} = $languages{c}; } my $language = 'c'; my ( $start, $showstart, $stop, $showstop, $trim, $verbatim ) = @{ $languages{c} }; my $result = GetOptions( "language=s" => \$language, "start=s" => \$start, "stop=s" => \$stop, "trim=s" => \$trim, "verbatim=s" => \$verbatim, "showstart" => \$showstart, "showstop" => \$showstop, ); exit(1) unless ($result); if ( $language ne 'c' ) { unless ( exists( $languages{$language} ) ) { die("Language '$language' not recognized\n"); } ( $start, $showstart, $stop, $showstop, $trim, $verbatim ) = @{ $languages{$language} }; } $start = qr{$start}; $stop = qr{$stop}; $trim = qr{$trim}; $verbatim = qr{$verbatim}; my $show = 0; my $lastempty = 1; while ( my $line = <DATA> ) { if ( $line =~ $start ) { unless ($lastempty) { $lastempty = 1; print "\n"; } $show = 1; next unless ($showstart); } elsif ( $line =~ $stop ) { $show = 0; goto SHOWSTOPPER if ($showstop); } if ($show) { SHOWSTOPPER: chomp($line); $line =~ s/$trim//; $line =~ s/$verbatim/ /; $lastempty = ( $line eq '' ); print $line, "\n"; } } __DATA__ This could be anything #ifdef pod =head2 title blah, blah, blah, blah, blah =v indent 1 #endif /* pod */ This could be anything, too #ifdef pod =head2 another title yo ho ho #endif /* pod */ more anything
Updated: changed ^.* to ^\s* for c patterns, which was my original intent. Thanks for spotting the error, John M. Dlugosz!

Replies are listed 'Best First'.
Re: Embedding pod in other languages
by John M. Dlugosz (Monsignor) on May 19, 2011 at 16:28 UTC
    Verbatim lines need some sort of special identification (currently =v followed by whitespace) to allow processed lines to begin in some column other than 0.
    I don't like that. It will be a bear to use, unless your editor handles that for you.

    That will find a #ifdef pod anywhere on the line, even with other stuff on it. In C, the # has to be the first non-whitespace character on a line, and it won't tolerate stuff after the expression other than whitespace and comments. In fact /^.*/ seems kind of silly, since the two cancel out. If it's not anchored to the front, you don't need to skip stuff! I think you wanted /^\s*# .../.

    I like the idea of embedding extraction details for known languages, to encourage standardization, but still allow it to be customized.

      Good catch on the ^.* typo!

      I'm not thrilled with the =v tag for verbatim lines, but I couldn't think of a better alternative. I considered a mechanism for starting and ending a "verbatim block", but controlling the amount of indent on each line fights with the desire to allow extra content up front. Assuming you start with a block of "verbatim text", most editors I know make it easy to paste a =v at the start of each line, after which you can adjust what comes before to suit the language and style.

        Is there some flaw in my suggestion of matching the indent level of the =code line that initiates it?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://905727]
Approved by Corion
Front-paged by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (8)
As of 2017-05-22 17:01 GMT
Find Nodes?
    Voting Booth?