Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

XML::Twig and namespaces

by DJpumps (Novice)
on Jul 04, 2007 at 06:47 UTC ( #624830=perlquestion: print w/ replies, xml ) Need Help??
DJpumps has asked for the wisdom of the Perl Monks concerning the following question:

I'm using XML::Twig to traverse XML documents.

When I get a copy of XML::Twig::Elt object and I try to apply the namespace method on it I get nothing, even though ns_prefix returns a prefix for the element and that prefix is indeed bound in the XML document.

The copy was obtained using:

        my $copy_of_twig = $twig->copy;
where $twig was an XML::Twig object that was used to travese some XML document and at some point needed to be copied.

Apparently, the copy operation strips off all the namespace awareness that the original $twig had. I don't know why this makes sense, and how to avoid this.

My current workaround is to save a reference to the original twig by:

        my $ref_to_twig = \$twig;
but this just waits to come back and bite me later because as soon as $twig changes its state again, the $ref_to_twig will not "remember" the state from the moment of the referencing, but will do what is expected of a reference to do, and will reference $twig as it changes...

Any ideas how to solve this?

Thank you.

-- DJpumps

Comment on XML::Twig and namespaces
Re: XML::Twig and namespaces
by GrandFather (Cardinal) on Jul 04, 2007 at 07:45 UTC

    Can you provide a short stand alone sample the demonstrates the issue? I know what I mean. Why don't you? may help with tips for putting such a sample together.


    DWIM is Perl's answer to Gödel
Re: XML::Twig and namespaces
by mirod (Canon) on Jul 04, 2007 at 08:35 UTC

    A test case would be helpful indeed, but I can try guessing.

    ns_prefix returns a prefix for the element and that prefix is indeed bound in the XML document

    Indeed, the prefix is bound in the XML document. But the newly created element is not part of a document. It's just a single detached element as far as XML::Twig is concerned. So it can't get the namespace information from its parent elements.

    I have to think about it a bit, and see what other libraries do, because possible requirements are a bit hard to all satisfy:

    • get the namespace method to work on a copied (and maybe cut) element (it doesn't right now as you pointed out)
    • get namespaces to work properly when pasting a copied or cut element in a new tree, or a different part of the tree at the moment the prefixes are untouched, so it doesn't work in all cases, but probably does in a lot of real cases)
    • not have to output all namespace declarations every time a copied ot cut element is pasted somewhere else
    • not slow down cut/copy/paste in cases where namespaces are not used

    At the moment you could probably get the namespace information by keeping a link to the original element in the copied document (stick it in an invisible attribute), and get the namespace info from it:

    $copied->set_att( '#elt', $elt); my $namespace= $copied->att( '#elt')->namespace();

    In fact I might well use something like that to solve the problem.

      Here's a test case:

      XML file:

      <?xml version="1.1"?>
      <a xmlns="default_ns_top">
              <b xmlns:foo="foo"/>
              <c xmlns:bar="bar"/>
              <d xmlns:baz="baz">
                      <e xmlns:lala="lala" xmlns:baz="baz2" xmlns="default_ns"/>
              </d>
      </a>
      
      

      a test perl script:

      
      use strict;
      use warnings;
      use Data::Dumper;
      use XML::Twig;
      
      my $twig=XML::Twig->new();
      $twig->xparse(shift);
      
      traverse($twig->root);
      
      
      sub traverse {
              my ($t) = @_;
              print "gi=|",$t->gi,"|\tprefix=|",$t->ns_prefix,"|\tnamespace=|",$t->namespace,"|\n";
              foreach my $c ($t->children) {
                      if (@ARGV) {
                              my $copy = $c->copy;
                              traverse($copy);
                      } else {
                              traverse($c);
                      }
              }
      }
      
      

      Notice the output when you run:

      $ perl test.pl test.xml # this is behaving OK
      

      But when you run it with a copy:

      $ perl test.pl test.xml copy # this is not OK
      

      You don't see a lot of namespace information.

      I'm going to try and use your proposed workaround.

      Thanks for your help.

      -- DJpumps
      OK, the workaround works well for me:

      use strict; use warnings; use Data::Dumper; use XML::Twig; my $twig=XML::Twig->new(); $twig->xparse(shift); traverse($twig->root); sub traverse { my ($t) = @_; my $elt=$t; $elt=$t->att('#elt') if $t->att('#elt'); print "gi=|",$t->gi,"|\tprefix=|",$elt->ns_prefix,"|\tnamespac +e=|",$elt->namespace,"|\n"; foreach my $c ($t->children) { if (@ARGV) { my $copy = $c->copy; $copy->set_att( '#elt', $c); traverse($copy); } else { traverse($c); } } }

      I do think that the expected behavior in the case that I raised is that namespace information is kept. For those who don't want it -- it would be nice to have a method that will cause the elt to "forget" its namespace information.

      What do you think about this?

      Thanks again for your help and rapid response.

      By the way, while we're talking about namespace support in XML::Twig -- please notice that there are unexpected #default namespaces attached to attributes when using the map_xmlns option to the new method. I think this is caused by XML::Parser::Expat. How can one avoid getting this #default and get the namespace (the URN) instead? I think that getting the URN for default namespace makes more sense than getting #default. Moreover, the #default is returned for attributes which have no default namespace according to the w3c recommendation. Why is that?

      Thanks.

      -- DJpumps

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://624830]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (10)
As of 2014-10-24 20:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (137 votes), past polls