http://www.perlmonks.org?node_id=683608

tford has asked for the wisdom of the Perl Monks concerning the following question:

I've been learning a little about Discrete Finite Automata and I'm excited about using them for lexing mathematical expressions. I have a question about why the following code seems to work.
sub argout { my $temp = {'type'=>'ARG', 'value'=>$buffer}; $buffer = ''; push(@output,$temp); } sub otherout { my $c = shift; my $temp = {'type'=>'OTHER', 'value'=>$c}; push(@output,$temp); } %states = ( 'start'=>{'s'=>{'nextstate'=>'var1.1'}}, 'var1.1'=>{'i'=>{'nextstate'=>'var1.2'}}, 'var1.2'=>{'s'=>{'action'=>'argout', 'nextstate'=>'var1.1'}, 'eof'=>{'action'=>'argout'}} ); sub lex { my $inputstring = shift; my @input = split('',$inputstring); my @output = (); $buffer = ''; my $s = 'start'; foreach my $c (@input,'eof') { if( my $t = $states{$s}->{$c} ) { eval( $t->{'action'} ); $s = $t->{'nextstate'}; } else { otherout($c); return 0; } $buffer = $buffer.$c; } return 1; } if( lex('sisisi') ) { print("success!\n"); } else {print("failure...")} foreach my $tok (@output) { print("$tok->{'type'},$tok->{'value'}\n"); }
This code produces the following output.
success! ARG,si ARG,si ARG,si
It seems to rely on the $buffer being a sort of "global" variable (global vars are bad right?), so that the argout function can access it. Now if I add a 'my' keyword in front of the $buffer declaration (inside the lex subroutine) then the output is the following.
success! ARG, ARG, ARG,
Presumably this is because the $buffer variable is no longer accessible from the 'argout' subroutine. Why is the @output variable accessible from the 'argout' subroutine even though it is declared local to 'lex'?

Knowing this would be one thing, but what I really want is that the $buffer, the @input, the @output, and possibly even the $c would be data inside of a "Dfa" object, and all of those other methods could just access those variables whenever they wanted. The reason I haven't done this yet is that the only way I know how to make objects in Perl is to make sort of a "special hash" and bless it with functions that have intimate access to it. I didn't want to do this because I didn't want to have to use constructs like $self->{'buffer'} everytime I wanted to use the buffer variable. I feel like I'm still stuck in a sort of "Java Mode" of thinking right now...Is there a better, "Perler" way of doing all this?

Any help will be greatly appreciated,
~Terry

Replies are listed 'Best First'.
Re: Help with Variable Scope
by pc88mxer (Vicar) on Apr 30, 2008 at 04:47 UTC
    Within the subroutine lex, @output refers to the local lexical declared with my. In argout as Fletch noted it refers to the package variable.

    The use of $buffer also refers to the package variable. If you were to use strict you would get error messages like:

    Global symbol "$buffer" requires explicit package name at ... Global symbol "@output" requires explicit package name at ... Global symbol "%states" requires explicit package name at ...
    To get rid of these messages simply declare the variables at the beginning of your source file with either:
    my ($buffer, @output, %states);
    or
    our ($buffer, @output, %states);
    if they need to be visible from outside the current package.
    Knowing this would be one thing, but what I really want is that the $buffer, the @input, the @output, and possibly even the $c would be data inside of a "Dfa" object,

    To implement a more object-oriented approach like you might be inclined to do in Java, one usually would proceed like this:

    package DFA; sub new { my $dfa = {}; bless $dfa, shift; # associates $dfa with the current package $dfa->{buffer} = ''; $dfa->{output} = []; # note: output is now an array reference $dfa->{states} = {}; # note: states is now a hash ref $dfa; # return the new object } ...

    And here is an example of how a subroutine like argout would be implemented as method of the DFA class:

    sub argout { my $dfa = shift; # $dfa here is equivalent to 'this' in Java my $temp = { type => 'ARG', value => $dfa->{buffer} }; $dfa->{buffer} = ''; push(@{$dfa->{output}}, $temp); } # example of constructing a DFA object and calling argout: my $dfa = DFA->new(); $dfa->argout();
    Also note how references to the variables $buffer and @output have changed.

    This is just a very cursory overview of how to create a class in perl. For more details, see perldoc perlobj.

      Thanks, pc that helped a lot. In case anybody's wondering what I'm up to, the eventual goal is to write a DFA builder.

      Given a list of variable names and recognized functions (at run time), the DFA that it builds should be able to tokenize mathematical expressions that have been written using implied multiplication. for example, if you have variables called 'si' and 'n', and a function called 'sin(', the expression sin(sin) should be tokenized as if the user had entered sin(si*n). The list of tokens produced might look like the following.

      ARG,sin( ARG,si ARG,n

      I imagine that there is also probably some way to do this using regular expressions, and it would probably involve using look-ahead characters. I think that I might be able to do it faster with recursion, however, and of course the main reason I'm going back to finite state machines is that I'm fascinated with finite state machines! After a lot of consideration, I decided that the DFA object should really only have it's state information as data.

      My original idea was to include the buffer and all that other stuff inside the object itself, and then to have different member functions that knew what to do depending on the current state the machine was in. This was a nice idea, but I'm trying to go for speed here, and when you really think about it, passing a bunch of object references around, and then using the $self->{buffer}, $self->{output} notation all the time is bound to have added overhead. So basically, since there really is only two things the machine can do (collect an input character in the buffer, or tokenize what's already in the buffer), I "hard-coded" those two actions into the main loop of the lex function.

      If anyone's interested, here is some toy code that can only recognize the tokens 'si' and 'sin('

      use strict; use warnings; package DFA; my $dfa = { start =>{nextstate =>{s =>'var1_1orfun1_1'}, entrytoken=>'OTHER'}, var1_1orfun1_1=>{nextstate =>{i =>'var1_2orfun1_2'}}, var1_2orfun1_2=>{exittoken =>{n =>'', default=>'ARG'}, nextstate =>{n =>'fun1_3', s =>'var1_1orfun1_1'}}, fun1_3 =>{nextstate =>{'(' =>'fun1_4'}}, fun1_4 =>{exittoken =>{default=>'ARG'}, nextstate =>{s =>'var1_1orfun1_1'}} }; bless($dfa,"DFA"); sub lex { my $self = shift; my @input = split('',shift); my @output = (); my $buffer = ''; my $currentstate = $self->{start}; for my $c (@input) { #exit action if( my $a = $currentstate->{exittoken} ) { if( my $tt = defined($a->{$c})?$a->{$c}:$a->{default} ) { push(@output,{type=>$tt,value=>$buffer}); $buffer = ''; } }#end exit action if #state transition my $s = $currentstate->{nextstate}->{$c} || 'start'; $currentstate = $self->{$s}; $buffer = $buffer.$c; #entry action if( my $tt = $currentstate->{entrytoken} ) { push(@output,{type=>$tt,value=>$buffer}); $buffer = ''; }#end entry action if }#end for loop #eof exit action if($buffer) { my $a = $currentstate->{exittoken}; my $tt = ($a)?$a->{default}:'OTHER'; push(@output,{type=>$tt,value=>$buffer}); } return @output; }#end function lex my $inputstring = 'sin('; print("an input of $inputstring produced the following output.\n"); for my $tok ($dfa->lex($inputstring)) { print("$tok->{type},$tok->{value}\n"); } $inputstring = 'sisin(si'; print("an input of $inputstring produced the following output.\n"); for my $tok ($dfa->lex($inputstring)) { print("$tok->{type},$tok->{value}\n"); } $inputstring = 'sin(sisin('; print("an input of $inputstring produced the following output.\n"); for my $tok ($dfa->lex($inputstring)) { print("$tok->{type},$tok->{value}\n"); }

      Output:

      an input of sin( produced the following output. ARG,sin( an input of sisin(si produced the following output. ARG,si ARG,sin( ARG,si an input of sin(sisin( produced the following output. ARG,sin( ARG,si ARG,sin(

      I used the logical or (as suggested here) to help deal with default cases and I also tried to use the // operator (as suggested here) to rewrite the expression  my $tt = defined($a->{$c})?$a->{$c}:$a->{default}, but the version of Perl I'm using is too early.

      Thanks again for everyone's help! I feel like I'm making some progress now.

Re: Help with Variable Scope
by Fletch (Bishop) on Apr 30, 2008 at 04:05 UTC

    You're getting the package variable @output, not your lexical. If you'd used strict you'd have gotten scolded for doing so.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.