If you want to avoid PRD, there's a few things you can do:
- Write an event parser for your language:
- pass events to an event handler object
- the two tokens 'page foo' generate an event new_type("page","foo")which creates a new elem as a child of the element at the top of a stack
- The token { puts the last child of the top of the stack on the top of the stack
- the } token pops an element from the stack
- anything that is not recognized as a "new thing" structure ((\w+)\s+(?:(\w+)\s+)?\{) is globbed up, and passed to the 'character_data' event, in your case, probably one per line
- the event handler has a 'root' element predefined, at the top of the stack
- use something like the event parser to convert the language with no state into XML or YAML or whatever, and use a parser for that
- use ??{ } in regexes in a similar manner to the event parser handler. If you're going that way, you can nest expressions using ??{ }. See perlre for some devious tricks you can do with this construct. /msg me if you would like me to post an example.
Update: it's done. it was fun, but don't use it. Someone below implemented the event parser I was talking about, just not in a decoupled OO kind of way.
use strict;
use warnings;
use re 'eval';
my $str = <<FOO;
page p1 {
question 4B {
label {
Do you like your pie with ice cream?
}
single {
1 Yes
2 No
}
}
question 4C {
label {
Do you like your pie with whipped cream?
}
single {
1 Yes
2 No
}
}
}
FOO
my $string = qr/
^ (?> \s* (.+) ) \s* $
(?{ add_string($^N) })
/xm;
my $tokens;
my ($type, $name);
my $block = qr/
# capture a type
(?: (\w+) \s+ ) (?{ $type = $^N })
(
# capture an optional name, set $name to that
(?{ $name = undef }) # first unset $name, in case this doesn't
+ match
((?: (\w+) \s+ )(?{ $name = $^N }) )?
)
\{ # if this starts to look like an element, push a new cell on th
+e stack
(?{ new_elem($type, $name) })
(
(
# this subpattern tries to capture a complete body, with t
+he closing brace
(??{ $tokens })
\}
(?{ close_elem() }) # if we got here it means we have a fu
+ll body, with tokens and a closing brace
) | (
# if we got here, then the body subpattern failed, and we
+ must abort
(?{ abort_elem() })
(?!) # this match always fails because it negates a match
+on anything, that always succeeds
)
)
/xs;
my $blocks = qr/($block \s*)+/xs;
my $strings = qr/($string \s*)+?/xs;
$tokens = qr/\s* ( $blocks | $strings ) \s*/xs; # tokens is either som
+e strings, or some blocks
my $doc = qr/^$tokens$/s;
my @stack;
new_elem("doc" => "root"); # create the root element
$str =~ $doc;
use Data::Dumper;
warn Dumper(@stack); # should contain just the root element
sub new_elem {
my $elem = {
type => $_[0],
(defined($_[1]) ? (name => $_[1]) : ()),
contains => [],
};
if (@stack){ push @{ $stack[-1]{contains} }, $elem }
push @stack, $elem;
}
sub abort_elem {
pop @stack;
pop @{ $stack[-1]{contains} };
}
sub close_elem { pop @stack }
sub add_string { push @{ $stack[-1]{contains} }, $_[0] }