http://www.perlmonks.org?node_id=531435

gjb has asked for the wisdom of the Perl Monks concerning the following question:

Wise Monks, I've to turn to you for a piece of advise on the following problem.

I'm using XML::Twig to parse an XML file. The output should simply be the path of each element in the DOM tree. I've written a handler that is associated to all start tags and that does precisely that. Since I don't want the leading '/', I strip it using substr. No problem so far. However, I also want to have the XML tags in lowercase and now things start to get interesting.

I've included two Perl programs, one that parses an actual XML file, the other simulating the behavior of the handler on ordinary text data to try and isolate the problem. The output of the latter seems fine, while the output of the former is clearly incorrect.

#!/usr/bin/perl use strict; use warnings; use XML::Twig; my $twig = XML::Twig->new( twig_handlers => {'_all_' => \&start_tag} ); $twig->parse(*DATA); sub start_tag { my ($t, $e) = @_; my $str = $e->path(); print substr(lc($str), 1), "\n"; print lc(substr($str, 1)), "\n\n"; } __DATA__ <A> <a> <B>blah blah</B> <b>blah blah blah</b> <b>blah <a/> blah</b> </a> <b/> </A>
The output produced is:
a/a/b a/a/b a/a/b a/a/b a/a/b a/a/b/a a/a/b a/a/b a/a a/a a/b a/b a a
Note the third group which doesn't yield the expected output. Below is the attempt to reproduce this outside the context of XML parsing:
#!/usr/bin/perl use strict; use warnings; while (<DATA>) { chomp($_); print_str($_); } sub print_str { my ($str) = @_; print substr(lc($str), 1), "\n"; print lc(substr($str, 1)), "\n\n"; } __DATA__ /A/a/B /A/a/b /A/a/b/a /A/a/b /A/a /A/b /A
which produces the expected results below:
a/a/b a/a/b a/a/b a/a/b a/a/b/a a/a/b/a a/a/b a/a/b a/a a/a a/b a/b a a

It would seem that within the XML handler something very weird happens, as if a variable with a fixed length (that which it has in the first invocation) is reused between calls to the handler.

I'd be grateful if someone could shed some light on this. Thanks in advance, -gjb-

Update: given that this seems to be a version specific issue, I should mention the results above have been obtained using XML::Twig 3.23 (i.e. the latest version) on Perl 5.8.7 built for cygwin-thread-multi-64int (i.e. the standard version that can be installed using Cygwin's installer).