<?xml version="1.0" encoding="windows-1252"?>
<node id="827674" title="Re: The story of a strange line of code: pos($_) = pos($_);" created="2010-03-09 21:06:39" updated="2010-03-09 21:06:39">
<type id="11">
note</type>
<author id="771802">
rubasov</author>
<data>
<field name="doctext">
For the sake of TIMTOWTDI I've tried to rewrite your code a little, by moving much of your explicit looping logic into the regex, letting the regex engine do the dirty work. Here it is:
&lt;code&gt;
use strict;
use warnings;

$_ = q(
\bib{ref0}{article}{
        author={Y. Bartal},
        volume={37},
        pages={184},
        date={1996},
        issn={0272-5428},
}
);

my @tokfd;
my $tokre = qr{
    (?&lt;bib&gt;     \\bib(?![A-Za-z])             )
  | (?&lt;text&gt;    (?s: \\(?:[A-Za-z]+|.) )      )
  | (?&lt;comment&gt; \%.*\n\s*                     )
  | (?&lt;equal&gt;   \=                            )
  | (?&lt;begin&gt;   \{                            )
  | (?&lt;end&gt;     \}                            )
  | (?&lt;space&gt;   \s+                           )
  | (?&lt;word&gt;    [A-Za-z0-9_\-\.]+             )
  | (?&lt;text&gt;    [^\\\%\=\{\}\sA-Za-z0-9_\-\.] )
}x;

push @tokfd, [ keys %+, values %+ ] while /\G$tokre/gc;
die "internal error: amsref reader tokenizer cannot match input line: ($_) at" . pos($_)
  if ( $+[0] != length );

for my $t (@tokfd) {
  my ( $i, $c ) = @$t;
  $c =~ s/\n/\\n/g;
  printf qq(%-8s "%s"\n), $i, $c;
}
&lt;/code&gt;
I've used regex branches instead of your for loop, and moved the matching into the while condition to eliminate the explicit loop control and to avoid the repeated zero-length matches. I've replaced the AoA with named captures.
&lt;p&gt;
As far as I can tell it produces the same output as yours, but I think it's a little more concise. It is also easy to see in the output when you accidentally make a branch matching the null string.
&lt;/p&gt;
I hope it is to your liking.</field>
<field name="root_node">
827649</field>
<field name="parent_node">
827649</field>
</data>
</node>
