Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

best practice when using XML::Parser and strict.

by reasonablekeith (Deacon)
on Feb 28, 2005 at 11:36 UTC ( #435024=perlquestion: print w/ replies, xml ) Need Help??
reasonablekeith has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

Servers are down this morning, which gives me time to seek advice re best practice when using XML::Parser and strict. :-)

Basically, when writing event driven parsing scripts, I always find that I need to share some variables between the handlers. My question is, whatís the best way to do this? I offer a simple example, which prints the node elements neatly indented.

use XML::Parser; use strict; parse_stuff(); sub parse_stuff { my $parser = new XML::Parser(Handlers => {Start => \&handle_start, End => \&handle_end}); no strict "vars"; local $indent; $parser->parsefile('/tmp/ra.xml'); } sub handle_start { my ($p, $el, %atts) = @_; our $indent; $indent++; print "-"x$indent . "$el\n"; } sub handle_end { our $indent; $indent--; }

Iím quite happy with this, as I donít have any global variables (I know any previous $indent value will be temporarily trashed, but I can cope with that), Iíve got a shared variable (dynamically scoped) to use in my handlers, and my handlers explicitly pick up this shared variable. My main concern is that Iíve had to use Ďno strict subsí define $indent, which makes me think Iíve done a bad thing, and that perhaps thereís a neater way of doing this, without having to turn off strict.

Thanks,

Rob

Comment on best practice when using XML::Parser and strict.
Download Code
Re: best practice when using XML::Parser and strict.
by Arunbear (Parson) on Feb 28, 2005 at 12:23 UTC
    This is one way:
    use XML::Parser; use strict; parse_stuff(); sub parse_stuff { my $parser = new XML::Parser( Handlers => {Start => \&handle_start, End => \&handle_end}); $parser->parsefile('/tmp/ra.xml'); } { my $indent; sub handle_start { my ($p, $el, %atts) = @_; $indent++; print "-"x$indent . "$el\n"; } sub handle_end { $indent--; } }
    $indent is now accessible from handle_start() and handle_end(), but not accessible outside the bare block.
      That's really nice. Following your lead, I think it's even neater to bring the sub definition into parse_stuff sub itself...
      use XML::Parser; use strict; parse_stuff(); sub parse_stuff { my $parser = new XML::Parser( Handlers => {Start => \&handle_start, End => \&handle_end}); my $indent; $parser->parsefile('/tmp/ra.xml'); sub handle_start { my ($p, $el, %atts) = @_; $indent++; print "-"x$indent . "$el\n"; } sub handle_end { $indent--; } }
      This seems so obvious now, I don't now why I didn't think of it in the first place. Many Thanks

        If you declare named subroutines within other subroutines, the value of lexical variables declared in the outter subroutine will not necessarily stay in sync with the value of that variable in the inner subroutine (if you ran this code with warnings, you would get a message saying "Variable $indent will not stay shared"). From perldoc perldiag:

        When the inner subroutine is called, it will probably see the value of the outer subroutine's variable as it was before and during the *first* call to the outer subroutine; in this case, after the first call to the outer subroutine is complete, the inner and outer subroutines will no longer share a common value for the variable. In other words, the variable will no longer be shared.

        Furthermore, if the outer subroutine is anonymous and references a lexical variable outside itself, then the outer and inner subroutines will never share the given variable.

        This problem can usually be solved by making the inner subroutine anonymous, using the sub {} syntax. When inner anonymous subs that reference variables in outer subroutines are called or referenced, they are automatically rebound to the current values of such variables.

        This may not be a problem with the code you posted, since handle_start and handle_end will (in theory) always be called in pairs and never be called except by XML::Parser. However, doing this in other parts of your code could result in unexpected results. For example this code:

        use strict; use warnings; print "calling outter from main:\n"; outter(); print "calling inner from main:\n"; inner(); print "calling outter from main:\n"; outter(); sub outter { my $var = 0; print " outter: $var\n"; print " calling inner, should increment 0 -> 1\n"; inner(); sub inner { print " inner: $var -> "; $var++; print "$var\n"; } }
        produces the following output:
        Variable "$var" will not stay shared at sub_in_sub.pl line 23. calling outter from main: outter: 0 calling inner, should increment 0 -> 1 inner: 0 -> 1 calling inner from main: inner: 1 -> 2 calling outter from main: outter: 0 calling inner, should increment 0 -> 1 inner: 2 -> 3

        Notice that when inner() is called from main, the value of $var is still 1, and the second time outter() is called the inner() sub starts with a value of 2 for $var instead of 0, as you might expect.

        Arunbear's suggestion declared the $indent variable and both handler subs inside a bare block. This limited the scope of $indent to just those two subs, but it doesn't suffer from the sharing problem described above. If you really want to declare subs within subs, look into closures.

        Thanks to tye for helping me find the reference to this in the docs.

        HTH

Re: best practice when using XML::Parser and strict.
by moot (Chaplain) on Feb 28, 2005 at 12:28 UTC
    Have you thought about using a class, and setting up your handlers as closures? Something like..
    package MyHandler; sub new { bless { indent => 0 }, shift } sub handle_start { my ($self, $p, $el, %atts) = @_; # use $self->{indent} here ... } sub handle_end { my ($self) = @_; # likewise here ... } 1; ... use XML::Parser; use strict; sub parse_stuff { my $handler = MyHandler->new(); my $parser = new XML::Parser(Handlers => { Start => sub { $handler-> +handle_start(@_) }, End => sub { $handler->handle_end(@_)}); $parser->parsefile('/tmp/ra.xml'); }
Re: best practice when using XML::Parser and strict.
by grantm (Parson) on Feb 28, 2005 at 23:33 UTC

    The best practice for using XML::Parser is to not use XML::Parser.

    The XML::SAX API is very similar to XML::Parser's Handler style except that because it uses objects, your handlers are methods and can maintain state in the object itself. Which solves exactly the problem you've encountered.

    Another advantage of SAX is that it's modular. So when you run into the problem of text content being split across multiple events, you don't need to code around it, you just plug in XML::Filter::BufferText and move on.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://435024]
Approved by sh1tn
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2014-07-13 11:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (249 votes), past polls