http://www.perlmonks.org?node_id=684481


in reply to Re: Help with Variable Scope
in thread Help with Variable Scope

Thanks, pc that helped a lot. In case anybody's wondering what I'm up to, the eventual goal is to write a DFA builder.

Given a list of variable names and recognized functions (at run time), the DFA that it builds should be able to tokenize mathematical expressions that have been written using implied multiplication. for example, if you have variables called 'si' and 'n', and a function called 'sin(', the expression sin(sin) should be tokenized as if the user had entered sin(si*n). The list of tokens produced might look like the following.

ARG,sin( ARG,si ARG,n

I imagine that there is also probably some way to do this using regular expressions, and it would probably involve using look-ahead characters. I think that I might be able to do it faster with recursion, however, and of course the main reason I'm going back to finite state machines is that I'm fascinated with finite state machines! After a lot of consideration, I decided that the DFA object should really only have it's state information as data.

My original idea was to include the buffer and all that other stuff inside the object itself, and then to have different member functions that knew what to do depending on the current state the machine was in. This was a nice idea, but I'm trying to go for speed here, and when you really think about it, passing a bunch of object references around, and then using the $self->{buffer}, $self->{output} notation all the time is bound to have added overhead. So basically, since there really is only two things the machine can do (collect an input character in the buffer, or tokenize what's already in the buffer), I "hard-coded" those two actions into the main loop of the lex function.

If anyone's interested, here is some toy code that can only recognize the tokens 'si' and 'sin('

use strict; use warnings; package DFA; my $dfa = { start =>{nextstate =>{s =>'var1_1orfun1_1'}, entrytoken=>'OTHER'}, var1_1orfun1_1=>{nextstate =>{i =>'var1_2orfun1_2'}}, var1_2orfun1_2=>{exittoken =>{n =>'', default=>'ARG'}, nextstate =>{n =>'fun1_3', s =>'var1_1orfun1_1'}}, fun1_3 =>{nextstate =>{'(' =>'fun1_4'}}, fun1_4 =>{exittoken =>{default=>'ARG'}, nextstate =>{s =>'var1_1orfun1_1'}} }; bless($dfa,"DFA"); sub lex { my $self = shift; my @input = split('',shift); my @output = (); my $buffer = ''; my $currentstate = $self->{start}; for my $c (@input) { #exit action if( my $a = $currentstate->{exittoken} ) { if( my $tt = defined($a->{$c})?$a->{$c}:$a->{default} ) { push(@output,{type=>$tt,value=>$buffer}); $buffer = ''; } }#end exit action if #state transition my $s = $currentstate->{nextstate}->{$c} || 'start'; $currentstate = $self->{$s}; $buffer = $buffer.$c; #entry action if( my $tt = $currentstate->{entrytoken} ) { push(@output,{type=>$tt,value=>$buffer}); $buffer = ''; }#end entry action if }#end for loop #eof exit action if($buffer) { my $a = $currentstate->{exittoken}; my $tt = ($a)?$a->{default}:'OTHER'; push(@output,{type=>$tt,value=>$buffer}); } return @output; }#end function lex my $inputstring = 'sin('; print("an input of $inputstring produced the following output.\n"); for my $tok ($dfa->lex($inputstring)) { print("$tok->{type},$tok->{value}\n"); } $inputstring = 'sisin(si'; print("an input of $inputstring produced the following output.\n"); for my $tok ($dfa->lex($inputstring)) { print("$tok->{type},$tok->{value}\n"); } $inputstring = 'sin(sisin('; print("an input of $inputstring produced the following output.\n"); for my $tok ($dfa->lex($inputstring)) { print("$tok->{type},$tok->{value}\n"); }

Output:

an input of sin( produced the following output. ARG,sin( an input of sisin(si produced the following output. ARG,si ARG,sin( ARG,si an input of sin(sisin( produced the following output. ARG,sin( ARG,si ARG,sin(

I used the logical or (as suggested here) to help deal with default cases and I also tried to use the // operator (as suggested here) to rewrite the expression  my $tt = defined($a->{$c})?$a->{$c}:$a->{default}, but the version of Perl I'm using is too early.

Thanks again for everyone's help! I feel like I'm making some progress now.