Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw


by Anonymous Monk
on Jun 17, 2002 at 11:15 UTC ( #175050=note: print w/replies, xml ) Need Help??

in reply to Re: Clunky parsing problem, looking for simple solution
in thread Clunky parsing problem, looking for simple solution

My apologies, I didn't have time to clean it up, I was in a hurry:

Missing all of the functions
Missing some operators
Missing post-processor that puts the lines back together

its essentially a recursive decent parser for basic, which does a translation to write out an if-then-else construct as a series of if-then and gotos using labels. It prefixes all the lines of the result with the original line number, with the aim of helping a post-processor then go through and correct the original GOTO lines and labels to point to the correct places.

it isn't well tested, but it will handle nested if/then/elses correctly. It could be expanded to handle compound statements with one line of code (mind you, in perl, thats not saying much).

The penalty is that it has to know everything about the language. Its basically half way to a basic compiler for Parrot once you add the functions and ops.

#!/usr/bin/perl use Data::Dumper; sub generic_node { my ($re, $tree) = @_; $current =~ s/^\s*//; if ($current !~ /$re/i) { return 0; } $current =~ s/$re//i; push @{$tree}, $1; return 1; }; sub generic_node_create { my ($re, $tree, $name) = @_; $current =~ s/^\s*//; if ($current !~ /$re/i) { return 0; } $current =~ s/$re//i; push @{$tree}, [$name, $1]; return 1; }; sub generic_check { my ($re, $tree) = @_; $current =~ s/^\s*//; return $current =~ s/$re//i; }; sub operator { generic_node_create('^([+-=*\/])',shift,"OPERATOR"); }; sub number { generic_node_create('^(\d+)',shift,"NUMBER"); }; sub string { generic_node_create('^\"([^\"]+)\"',shift,"STRING"); }; sub numeric_variable { generic_node_create('^([A-Z]\d)',shift,"NUMERIC_VAR"); }; sub string_variable { generic_node_create('^([A-Z]\$)',shift,"STRING_VAR"); }; sub equals { generic_node_create('^([=])',shift,"EQUALS"); }; sub expression { my $tree = shift; my $t = [EXPRESSION]; string_variable($t) || numeric_variable($t) || string($t) || number($t) || (generic_check('^[\(]',$t) && expression($t) && generic_check( +'^[\)]',$t)) || return 0; if (operator($t)) { expression($t) || return 0; } push @{$tree},$t; return 1; }; sub assignment { my $tree = shift; my $t = [ASSIGNMENT]; numeric_variable($t) || string_variable($t) || return 0; equals($t) || return 0; expression($t) || return 0; push @{$tree},$t; return 1; }; sub comment { if (generic_check('^REM')) { return 1; } }; sub if_condition { my $tree = shift; my $t = [IF_CONDITION]; generic_check('^IF',$t) || return 0; expression($t) || die "Invalid IF statement, expected expressi +on"; generic_check('^THEN',$t) || die "Invalid IF statement, expect +ed THEN"; statement($t) || die "Invalid IF construct, expected statement + after THEN"; if (generic_check('^ELSE',$t)) { statement($t) || die "Invalid IF/ELSE construct, expec +ted statement after ELSE"; } push @{$tree},$t; return 1; }; sub goto_statement { my $tree = shift; my $t = [GOTO]; generic_check('^GOTO',$t) || return 0; number($t) || die 'GOTO must be followed by a number'; push @{$tree},$t; }; sub statement { my $tree = shift; my $t = [STATEMENT]; assignment($t) || comment($t) || if_condition($t) || goto_statement($t) || return 0; push @{$tree},$t; }; # # Pretty-print routines # sub print_expression { my ($subelement) = @_; if ($subelement->[0] eq "EXPRESSION") { if ($subelement->[2][0] ne "OPERATOR") { return "(".print_expression($subelement->[1]). +")"; } else { return "(".print_expression($subelement->[1]). +" ".$subelement->[2][1]." ".prin t_expression($subelement->[3]).")"; } } if ($subelement->[0] eq "NUMBER") { return $subelement->[1]; } if ($subelement->[0] eq "STRING") { return '"'.$subelement->[1].'"'; } if ($subelement->[0] eq "STRING_VAR") { return $subelement->[1]; } if ($subelement->[0] eq "NUMBER_VAR") { return $subelement->[1]; } }; sub print_statement { my ($statement) = @_; my $s = ""; $level++; my $subelement = $statement->[1]; if ($subelement->[0] eq "IF_CONDITION") { add_line("IF ".print_expression($subelement->[1])." TH +EN GOTO THEN_$level"); if ($subelement->[3]) { print_statement($subelement->[3]); } add_line("GOTO END_$level"); add_line("THEN_$level:"); print_statement($subelement->[2]); + add_line("END_$level:"); } if ($subelement->[0] eq "GOTO") { add_line("GOTO ".$subelement->[1][1]); } if ($subelement->[0] eq "ASSIGNMENT") { add_line($subelement->[1][1]." = ".print_expression($s +ubelement->[3])); } if ($subelement->[0] eq "COMMENT") { add_line("REM - comment here"); } $level--; }; sub add_line { push @lines, [$current_line, shift]; }; sub pretty_print { my $tree = shift; $ln = shift(@{$tree}); print_statement($tree->[0]); }; while (<>) { chomp($_); $current = $_; $tree = []; generic_node('^(\d+)', $tree) || die "Line number invalid"; $current_line = $tree->[0]; statement($tree) || die "Invalid statement"; print Dumper($tree); $level = 0; pretty_print($tree); }; for (@lines) { print $_->[0]." ".$_->[1]."\n"; }

Replies are listed 'Best First'.
Re: Parser
by jepri (Parson) on Jun 17, 2002 at 22:05 UTC
    Wow. If you put that much work in, you should make yourself an account so you can feel the goodness of peer appreciation. That and you can give clintp a name to hate, so he doesn't have to keep referring to an anonymous monk :)

    I didn't believe in evil until I dated it.

      Yes well you see the problem is that I was feeling slightly guilty. When I read the article originally, i had no idea what Parrot was, so the term "Parrot Basic" suggested a version of basic called Parrot, not, as it turned out, a version of basic that was supposed to run on an alpha-version virtual machine.

      Thus, you can understand that in my view, the fact that the basic interpreter (which I figured to be in perl) could not handle nested ELSE suggested that its internals were designed incorrectly (since perl is more than capable of recursion with little effort), and therefore it was in the best interests of the questioning monk to re-write his code in a manner more suited to the problem. I have no problem being blunt about this if necessary.

      The issue of course, being that it was not just another basic interpreter, nor was it written in perl. Instead it was written by hand in a register/op-code language that wasn't close to finished. While I looked into it myself and determined that a recursive decent parser is practical on the VM as it stands, it is by no means as trivial as the perl variant.

      Thus, by way of apology, I wrote code that would parse the Basic language the proper way and then do the necessary transformations to output a simplified Basic that would be easier to write Parrot code for until such time as it becomes easy to write a grammar within Parrot.

        That was nice of you to write that code. You appear to be quite good at it. We're all practising writing parsers and compilers to get ready for the main event - Perl 6, which will also compile down to parrot assembler. Find out more at You seem to enjoy writing this kind of code, maybe writing the next perl would appeal?

        In any case join the community - you don't have to own up to this exchange unless you want to :)

        I didn't believe in evil until I dated it.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://175050]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2018-06-24 02:11 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.