Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

XML::Twig::Handlers - promoting laziness through magic

by PodMaster (Abbot)
on Nov 15, 2002 at 16:38 UTC ( #213193=perlmeditation: print w/ replies, xml ) Need Help??

=head1 NAME XML::Twig::Handlers - promoting laziness through magic =head1 DESCRIPTION An XML::Twig subclass which you subclass, so you don't have to write twig_handlers => { blah=>\&blah_handler, ... }, explicitly. =head1 SYNOPSIS So if you had package MyTwiggy; require XML::Twig::Handlers; use base qw( XML::Twig::Handlers ); sub blah_handler { } Creating a C<new MyTwiggy> would essentially be the same as my $t = MyTwiggy->new( twig_handlers => { blah=>\&blah_handler } ); You could also have sub blah_root { } which would magically translate to twig_roots => { blah => \&blah_root; } =head1 CAVEAT B<BEWARE>!!! There really is no reason for you to write sub blah_handler { ... } sub blah_root { ... } It is sufficient to write sub blah_h sub blah_r { ... } Because # this is the regex i use to match methods /^[^_]+_[hHrR]/ You will get no warnings about possible conflicts, for example sub blah_h { ... } sub blah_H { ... } =head1 BUGS _all_ and _default_ are not supported because I keep getting "unrecognized expression in handler" carped by XML::Twig. Hopefully this will be resolved in the next version. =cut package XML::Twig::Handlers; use vars qw( $VERSION @ISA ); require XML::Twig; @ISA = qw( XML::Twig ); $VERSION = 0.01; # now ripping off # Devel::GetSymbols::symbols; # no strict 'refs'; sub _symbols { my ($type, $package) = @_; $package = (caller)[0] unless defined $package; # croak 'Usage: symbols(type[, package])' unless defined $type; grep defined *{"${package}::$_"}{$type}, keys %{"${package}::"} } sub _handlers { my $pack = shift or (caller)[0]; return map { my($r) = split/_/,$_,2; ( $r => \&{"${pack}::$_"} ); } grep { /^[^_]+_[hH]/ # /^(?:_all|_default|[^_]+)_[hH]/ } _symbols('CODE',$pack); } sub _roots { my $pack = shift or (caller)[0]; return map { my($r) = split/_/,$_,2; ( $r => \&{"${pack}::$_"} ); } grep { /^[^_]+_[rR]/ # /^(?:_all|_default|[^_]+)_[rR]/ } _symbols('CODE',$pack); } use strict; BEGIN{eval q{use warnings;};} # only if we got'em sub new { my( $pack, @options ) = @_; my @Handlers = _handlers($pack); my @Roots = _roots($pack); push @options, twig_roots => { @Roots } if @Roots; push @options, twig_handlers => { @Handlers } if @Handlers; return $pack->SUPER::new(@options); } 1;
This is the example, and it'll be run automatically if you downloaded this module using the "d/l code" link and then ran the file through perl.
################################ ## The Example package MyTwiggy; use vars qw( @ISA ); eval q{require XML::Twig::Handlers;}; # just in case @ISA = qw( XML::Twig::Handlers ); sub Doc_root { my( $t, $doc)= @_; $doc->print; print "\n",'x'x69,"\n"; } sub foo_h { my( $t, $foo) = @_; print "\n\t\t## saw foo ##\n"; } 1; package main; my $t = MyTwiggy->new(); print "$t\n"; $t->parse(\*DATA); __END__ <Stream> <Doc> <foo>hey man</foo> <foo>hey man2</foo> </Doc> <Doc> <bar>hey man, how's it goin'?</bar> </Doc> <Doc> <baz>pretty right on.</baz> </Doc> </Stream>

____________________________________________________
** The Third rule of perl club is a statement of fact: pod is sexy.

Comment on XML::Twig::Handlers - promoting laziness through magic
Select or Download Code
Re: XML::Twig::Handlers - promoting laziness through magic
by mirod (Canon) on Nov 15, 2002 at 17:04 UTC

    You are indeed a very lazy programer Mr PodMaster... which can be construed as a compliment around here I guess ;--)

    This is a neat trick though, more or less the equivalent of the subs style for XML::Parser.

    Note though that using this subclass will limit you to using handlers on element names, while there are _many_ other options. From the docs:

    twig_handlers
    This argument replaces the corresponding XML::Parser argument. It consists of a hash { expression = \&handler}> where expression is a generic_attribute_condition, string_condition, an attribute_condition,full_path, a partial_path, a gi, _default_ or <_all_>.

    The idea is to support a usefull but efficient (thus limited) subset of XPATH. A fuller expression set will be supported in the future, as users ask for more and as I manage to implement it efficiently. This will never encompass all of XPATH due to the streaming nature of parsing (no lookahead after the element end tag).

    A generic_attribute_condition is a condition on an attribute, in the form *[@att='val'] or *[@att], simple quotes can be used instead of double quotes and the leading '*' is actually optional. No matter what the gi of the element is, the handler will be triggered either if the attribute has the specified value or if it just exists.

    A string_condition is a condition on the content of an element, in the form gi[string()='foo'], simple quotes can be used instead of double quotes, at the moment you cannot escape the quotes (this will be added as soon as I dig out my copy of Mastering Regular Expressions from its storage box). The text returned is, as per what I (and Matt Sergeant!) understood from the XPATH spec the concatenation of all the text in the element, excluding all markup. Thus to call a handler on the element<p>text <b>bold</b></p> the appropriate condition is p[string()='text bold']. Note that this is not exactly conformant to the XPATH spec, it just tries to mimic it while being still quite concise.

    An extension of that notation is gi[string(child_gi)='foo'] where the handler will be called if a child of a gi element has a text value of foo. At the moment only direct children of the gi element are checked. If you need to test on descendants of the element let me know. The fix is trivial but would slow down the checks, so I'd like to keep it the way it is.

    A regexp_condition is a condition on the content of an element, in the form gi[string()=~ /foo/']. This is the same as a string condition except that the text of the element is matched to the regexp. The i, m, s and o modifiers can be used on the regexp.

    The gi[string(child_gi)=~ /foo/'] extension is also supported.

    An attribute_condition is a simple condition of an attribute of the current element in the form gi[@att='val'] (simple quotes can be used instead of double quotes, you can escape quotes either). If several attribute_condition are true the same element all the handlers can be called in turn (in the order in which they were first defined). If the ='val' part is ommited ( the condition is then gi[@att]) then the handler is triggered if the attribute actually exists for the element, no matter what it's value is.

    A full_path looks like '/doc/section/chapter/title', it starts with a / then gives all the gi's to the element. The handler will be called if the path to the current element (in the input document) is exactly as defined by the full_path.

    A partial_path is like a full_path except it does not start with a /: 'chapter/title' for example. The handler will be called if the path to the element (in the input document) ends as defined in the partial_path.

    WARNING: (hopefully temporary) at the moment string_condition, regexp_condition and attribute_condition are only supported on a simple gi, not on a path.

    A gi (generic identifier) is just a tag name.

    #CDATA can be used to call a handler for a CDATA section respectively.

    A special gi _all_ is used to call a function for each element. The special gi _default_ is used to call a handler for each element that does NOT have a specific handler.

    The order of precedence to trigger a handler is: generic_attribute_condition, string_condition, regexp_condition, attribute_condition, full_path, longer partial_path, shorter partial_path, gi, _default_ .

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://213193]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2014-07-24 00:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (155 votes), past polls