Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Modify XML tags

by elgato (Novice)
on Nov 18, 2011 at 15:02 UTC ( #938852=perlquestion: print w/replies, xml ) Need Help??

elgato has asked for the wisdom of the Perl Monks concerning the following question:

Hi. Is there a way to change all tags and attributes in XML to lower case? (Faced XPath feature being case-insensitive) Just tags and attributes, not values. I don't know tag names (there can be many variants). Tried to search for console sed/awk commands, but without any luck. Thanks in advance.

Replies are listed 'Best First'.
Re: Modify XML tags
by Anonymous Monk on Nov 18, 2011 at 15:22 UTC

    Hi. Is there a way to change all tags and attributes in XML to lower case?

    Sure, here is a start

    #!/usr/bin/perl -- use strict; use warnings; use XML::Twig; my $str = <<'EOF'; <NoTe> <To> <Person>Satan</Person> </To> <Beef><SaUsAGe>is Tasty</SaUsAGe></Beef> </NoTe> EOF { my $t = XML::Twig->new( pretty_print => 'indented', force_end_tag_handlers_usage => 1, start_tag_handlers => { _all_ => sub { $_->set_tag( lc $_->ta +g ); return }, }, end_tag_handlers => { _all_ => sub { $_->set_tag( lc $_->tag +); return }, }, ); $t->parse($str); $t->flush(); } __END__

    (Faced XPath feature being case-insensitive)

    Why is this a problem?

      Nice! One of the few cases where it makes sense to use force_end_tag_handlers_usage, Bravo!

      I don't think you need the end_tag_handlers handler though, the start one should be enough. You could also flush at the end of the handler to save memory, if that's an issue (untested).

        Yup, start_tag_handlers is enough, but the flushing has to be done from end_tag_handler

        #!/usr/bin/perl -- use strict; use warnings; use XML::Twig; my $str = <<'EOF'; <NoTe KunG="FoO" ChOp="SuEy"> <To KunG="FoO"> <Person KunG="FoO">Satan</Person> </To> <Beef KunG="FoO"><SaUsAGe KunG="FoO">is Tasty</SaUsAGe></Beef> </NoTe> EOF { my $t = XML::Twig->new( pretty_print => 'indented', force_end_tag_handlers_usage => 1, start_tag_handlers => { _all_ => sub { $_->set_tag( lc $_->tag ); if( $_->has_atts ){ my $atts = $_->atts ; $_->set_atts ({ map { lc( $_ ) => $atts->{$_} } keys %{ $atts } }); } return }, }, end_tag_handlers => { _all_ => sub { $_->flush; return }, }, ); $t->parse($str); $t->flush(); } __END__ <note chop="SuEy" kung="FoO"> <to kung="FoO"> <person kung="FoO">Satan</person> </to> <beef kung="FoO"> <sausage kung="FoO">is Tasty</sausage> </beef> </note>

        I don't think you need the end_tag_handlers handler though, the start one should be enough. You could also flush at the end of the handler to save memory, if that's an issue (untested).

        Flushing in start_tag handler doubles the output  <note chop="SuEy" kung="FoO"></note> but end_tag_handlers => { _all_ doesn't get called at all

        so nothing gets flushed until the whole tree is parsed

        Is this by design of end_tag_handlers?

      The problem is it's 10-50mb per file, and there are CDATA sections also. And i need to transform the xml very fast.
Re: Modify XML tags
by choroba (Archbishop) on Nov 18, 2011 at 15:36 UTC
    What do you mean by "XPath feature being case-insensitive"? XPath is case sensitive, AFAIK.

    I usually use XML::XSH2 for XML manipulation. To lowercase all element and attribute names, you can use:

    rename xsh:lc(name(.)) (//* | //@*)
    Be careful if the lowercase attribute already exists!
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://938852]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2022-01-18 18:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (54 votes). Check out past polls.

    Notices?