http://www.perlmonks.org?node_id=837821

Gizmo has asked for the wisdom of the Perl Monks concerning the following question:

I've written two scripts with XML::Simple and XML::DOM to do some replacements they work but with XML::Simple I have a problem with the output being alphabetically sorted which I've read is a limitation. I'm now trying to do the same with XML::Twig but can't seem to figure out how to do it. I've displayed the XML and the Simple code so you can get a better understanding of what I want to achieve. I'm wanting to change the Id= part from Id="/Local/ App/App1" to Id=/App1"

XML snip:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?> <Profile xmlns="xxxxxxxxx" name="" version="1.1" xmlns:xsi="http:// www.w3.org/2001/XMLSchema-instance"> <Application Name="App1" Id="/Local/App/App1" Services="1" pol +icy="" StartApp="" Bal="5" sessInt="500" WaterMark="1.0"/> <AppProfileGuid>586e3456dt</AppProfileGuid> </Profile>
XML Simple Code
use XML::Simple; my $xml = new XML::Simple (ForceArray => 1, KeepRoot => 1,KeyAttr=>[]) +; my $data = $xml->XMLin($xmlfile); my $Id = $data->{Profile}->[0]->{Application}->[0]->{Id}; my $CsID = (split(/\//, $Id))[-1]; $data->{Profile}->[0]->{Application}->[0]->{Id} = $CsID; print $xml->XMLout($data);

Replies are listed 'Best First'.
Re: XML::Twig Text replacement
by ikegami (Patriarch) on Apr 30, 2010 at 18:28 UTC

    XML::Twig:

    use strict; use warnings; use XML::Twig qw( ); binmode STDOUT; my $t = XML::Twig->new( twig_handlers => { '/Profile/Application' => sub { my $Id = $_->att('Id'); my $CsID = (split(/\//, $Id))[-1]; $_->set_att(Id => $CsID); }, }, ); $t->parsefile($ARGV[0]); $t->flush();

    XML::LibXML:

    use strict; use warnings; use XML::LibXML qw( ); use XML::LibXML::XPathContext qw( ); my $doc = XML::LibXML->new()->parse_file($ARGV[0]); my $root = $doc->documentElement(); my $xpc = XML::LibXML::XPathContext->new(); $xpc->registerNs(x => 'xxxxxxxxx'); for ($xpc->findnodes('/x:Profile/x:Application', $root)) { my $Id = $_->getAttribute('Id'); my $CsID = (split(/\//, $Id))[-1]; $_->setAttribute(Id => $CsID); } binmode STDOUT; print $doc->toString();

    XML::LibXML is a bit wordier than XML::Twig (the 2 xpc lines) in order to handle namespaces correctly. (XML::Twig doesn't.)

      Actually you can handle namespaces in XML::Twig, using the map_xmlns option. I am not sure it's worth doing in this case though (and it might be a good example of why I dislike seemingly gratuitous default namespaces, they just make processing harder while providing exactly 0 added value).

      Also, if you use the id => 'Id' option in the new, you can then write $Id= $_->id and $_->set_id( $CsID); which I think is slighty clearer, and has the added benefit, if need be, to let you access an element directly through its id, using the elt_id method.

        Ah good. I don't use XML::Twig, so I don't have a deep knowledge of it.

        If it consistently ignored namespaces when map_xmlns isn't used, it would be a great shortcut despite being non-standard since there is rarely need to deal with namespace conflicts. (The module never claimed them to be a real XPaths.) Unfortunately, it doesn't consistently ignore namespaces.

        On the plus side, it works according to standard when map_xmlns is used. (Well, I'm not sure how namespaces interact with attributes, so I'm simply commenting on elements.)

        use strict; use warnings; use XML::Twig qw( ); my $xml = <<'__EOI__'; <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <root xmlns:foo="uri:foo"> <ele id="a" /> <ele id="b" xmlns="uri:foo"/> <foo:ele id="c" /> </root> __EOI__ { my $seen = ''; my $t = XML::Twig->new( twig_handlers => { 'ele' => sub { $seen .= $_->att('id') }, }, ); $t->parsestring($xml); print("$seen\n"); print($seen eq 'a' ? "Standard\n" : "Not standa +rd\n"); print($seen eq 'a' || $seen eq 'abc' ? "Consistent\n" : "Not consis +tent\n"); } print("\n"); { my $seen_null = ''; my $seen_foo = ''; my $t = XML::Twig->new( map_xmlns => { 'uri:foo' => 'f', }, twig_handlers => { 'ele' => sub { $seen_null .= $_->att('id') || $_->att('f: +id') }, 'f:ele' => sub { $seen_foo .= $_->att('id') || $_->att('f: +id') }, }, ); $t->parsestring($xml); print("$seen_null:$seen_foo\n"); print($seen_null eq 'a' ? "Standard\n" : " +Not standard\n"); print($seen_null eq 'a' || $seen_null eq 'abc' ? "Consistent\n" : " +Not consistent\n"); print($seen_foo eq 'bc' ? "NS working\n" : " +NS broken\n"); }
        ab Not standard Not consistent a:bc Standard Consistent NS working
      Thanks a lot, makes sense now. I'll try out the libXML too.
Re: XML::Twig Text replacement
by toolic (Bishop) on Apr 30, 2010 at 18:29 UTC
    use strict; use warnings; use XML::Twig; my $x = <<EOF; <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <Profile xmlns="xxxxxxxxx" name="" version="1.1" xmlns:xsi="http:// www.w3.org/2001/XMLSchema-instance"> <Application Name="App1" Id="/Local/App/App1" Services="1" pol +icy="" StartApp="" Bal="5" sessInt="500" WaterMark="1.0"/> <AppProfileGuid>586e3456dt</AppProfileGuid> </Profile> EOF my $t = XML::Twig->new(twig_handlers => {Application => \&app}); $t->parse($x); $t->print(); sub app { my ($twig, $app) = @_; my $id = $app->att('Id'); $id =~ s{^/Local/App}{}; $app->set_att('Id', $id); }
      That would match Application elements anywhere, whereas the OP would only match Application elements under the root element. It might not make a difference, though.