For manually writing lexers my favorite idiom is $$s =~ m/\G.../gc. In scalar context it permits to advance in a string $$s I want to lex. If it matches the
current position, it moves past the match, if not, the position is inchanged, \G permits to anchor the match
at the current position.
I could also use $$s = s/^...//. It
does not cost much because the implementation does not
move the string to truncate but just move an internal
pointer.
But this
is immaterial to the following discussion.
A lexer for Parse::Yapp ends up looking like
I know that I always match on $$s so why should I restate it at each match. I _had_ to remove these useless $$S !sub lexer { my($parser)=shift; my $s = $parser->YYData->{INPUT}; # reference to the string to lex m/\G\s+/gc; skip any spaces return ('INT', $1) if $$s =~ m/\G\(d+)/gc; return ('ID', $1) if $$s =~ m/\G([A-Z]\w*)/gc; ... # and it goes on for many tentative matches }
It took me a long time to realize that I could do it with a typeglob trick :
Now $_ is an alias to the string to lex. So I can match on it I and don't need the =~ operator anymore*_ = $parser->YYData->{INPUT}; # reference to the string to lex
--
stefp
Back to
Meditations