I often prefer solutions that are declarative in nature. Rather than
writing code to do the work, I write code to interpret or
compile a description of the work into the code that
does the actual work.
In your problem, for example, we have the following situation:
- We have regex operations.
- We want to apply certain of the operations to certain pages.
- Given a page, we want to know which regex operations to apply,
and then we want to apply them.
Since I don't know the specifics of your situation, let's say that
you're working on books and that you deal with three kinds of pages:
front matter, body, and index. Let's further say that each page has
two properties: (1) its content (the text to appear on the page)
and (2) its page type (one of the three we listed earlier).
Now, let's say that we have the following rules for processing
the pages:
- All pages are expected to have a ::PAGENUM:: placeholder that
shall be replaced by the page number during processing. On
front-matter pages, however, the page number shall be displayed as
a roman numeral.
- Front-matter pages may may contain ::COPYRIGHT:: and ::PRINTING::
placeholders that shall be replaced by copyright and
printing information. These placeholders are ignored on other
kinds of pages.
- Body pages require no additional processing for now (but might later).
- Index pages require no additional processing for now (but might
later).
I would probably convert the rules into a simple text-based specification
that is easy for humans to understand and edit:
body:
+all_pages
front_matter:
s/::PAGENUM::/roman_numeral($page_number)/eg;
s/::COPYRIGHT::/Copyright 2004 blah, blah/g;
s/::PRINTING::/1st printing, Blah Blah Press/g;
+all_pages
index:
+all_pages
all_pages:
s/::PAGENUM::/$page_number/eg;
The spec's meaning is straightforward. Each page type is represented
by a labeled section. Each section contains a bit of Perl code that
gives the substitutions to be performed on pages of that type.
Further, to make reuse easy, we define lines of the form
+label to mean "and now do the stuff specified
in the section labeled label, too."
The idea is to be able to convert this specification into an engine
that makes it easy process pages given their page types. For example,
to process and print out a book, this is all the more complicated we
should need to get:
my $page_engine = make_regex_engine_from_spec( $spec_fh );
my $page_number = 1;
for my $page (@book_pages) {
print $page_engine->( @$page{'content','page_type'} ), "\n";
$page_number++;
}
That's pretty simple, right? But like most things in life
this simplicity comes as a price: We must write
the code that reads the spec and converts it into an engine
for us. Fortunately, the price is isn't too high:
sub make_regex_engine_from_spec
{
my $fh = shift; # filehandle contains spec
my %sections;
my $label;
# read in spec
while (<$fh>) {
chomp;
next unless /\S/; # skip blanks
if (/^(\w+):/) {
$label = $1;
}
else {
die "syntax error: need a section label\n"
unless $label;
push @{$sections{$label}}, $_;
}
}
# compile spec into code
my $interpret = sub {
local $_ = shift;
if ( /^ \s* \+ (\w+) /x ) {
if ($sections{$1}) {
return '$sections{'.$1.'}->();';
}
die "there is no section named '$1'";
}
return $_;
};
while (($label, my $section) = each %sections) {
my $generated_code =
join "\n", 'sub {',
(map $interpret->($_), @$section), "}\n";
$sections{$label} = eval $generated_code
or die "couldn't eval section $label: $@";
}
# return processor engine that embodies compiled spec
return sub {
# args: page content, page type
(local $_, my $page_type) = @_;
my $processor = $sections{$page_type};
$processor->() if $processor;
return $_;
}
}
That might seem like a lot of code. However, it's of constant size
and won't change as our regex needs grow and become more complicated.
All we'll need to do is change our spec, which we expect will be
easier than writing the equivalent code by hand. We're hoping that
the simplicity and cost savings of the specification language more than
pay for the one-time cost of having to write that function above.
To test out the spec-based system, let's create some pages of
various types:
my @book_pages = (
{ page_type => 'front_matter',
content => "This is the copyright page (::PAGENUM::).\n"
. "::COPYRIGHT::\n"
. "::PRINTING::\n" },
{ page_type => 'body',
content => "This is a body page (::PAGENUM::).\n" },
{ page_type => 'index',
content => "This an index page (::PAGENUM::).\n" },
);
And here's what the pages look like when processed sequentially as a book
using the for loop from earlier:
This is the copyright page (i).
Copyright 2004 blah, blah
1st printing, Blah Blah Press
This is a body page (2).
This an index page (3).
Each of the page types was processed as expected. All of the expected
placeholders were replaced on all pages. The copyright page (which is
front matter) has a roman-numeral page number.
Looks like we're ready to print our book. :)
So that's how I might do it: (1) Write a spec. (2) Write code to
convert the spec into worker code. (3) Use the worker code to do the work.
Cheers, Tom
P.S. The complete code, ready to run, is below for your convenience:
|