For manually writing lexers my favorite idiom is $$s =~ m/\G.../gc. In scalar context it permits to advance in a string $$s I want to lex. If it matches the current position, it moves past the match, if not, the position is inchanged, \G permits to anchor the match at the current position. I could also use $$s = s/^...//. It does not cost much because the implementation does not move the string to truncate but just move an internal pointer. But this is immaterial to the following discussion.

A lexer for Parse::Yapp ends up looking like

sub lexer { my($parser)=shift; my $s = $parser->YYData->{INPUT}; # reference to the string to lex m/\G\s+/gc; skip any spaces return ('INT', $1) if $$s =~ m/\G\(d+)/gc; return ('ID', $1) if $$s =~ m/\G([A-Z]\w*)/gc; ... # and it goes on for many tentative matches }
I know that I always match on $$s so why should I restate it at each match. I _had_ to remove these useless $$S !

It took me a long time to realize that I could do it with a typeglob trick :

*_ = $parser->YYData->{INPUT}; # reference to the string to lex
Now $_ is an alias to the string to lex. So I can match on it I and don't need the =~ operator anymore

-- stefp