Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Replacing things in XML files

by John M. Dlugosz (Monsignor)
on Apr 26, 2006 at 19:10 UTC ( [id://545849]=perlquestion: print w/replies, xml ) Need Help??

John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

In general, suppose I wanted to find and replace some things in an XML file. I could make a lot of assumptions about the formatting and context, and do ordinary pattern replacements, and print out each line possibly with changes.

But it would be far better to actually parse the XML correctly, look up the item in the document model, change it, and then re-save the file.

So, what about the original formatting of the file? I want to keep the comments, indents and line breaks from the original, making a minimal change.

What's the normal idiom for that?

—John

Replies are listed 'Best First'.
Re: Replacing things in XML files
by borisz (Canon) on Apr 26, 2006 at 19:23 UTC

      #!/usr/bin/perl use strict; use warnings; use XML::LibXML; local $/; my $xml = <DATA>; print $xml, "\n"; my $parser = XML::LibXML->new(); my $doc = $parser->parse_string( $xml ); my $nodes = $doc->findnodes( '/msg/text()' ); my $node = $nodes->pop(); my $string = $node->data; $string =~ s/^As a last resort //; $string =~ s/is also/is extremely/; $node->setData( $string ); $node->appendData( " ... and XML::LibXSLT too!" ); print $doc->toString(); __DATA__ <?xml version="1.0" encoding="iso-8859-1"?> <msg>As a last resort XML::LibXML is also useful.</msg>

      -derby
        use strict; use XML::Twig; local $/; XML::Twig->new( TwigHandlers => { '/root/msg' => sub { $_[1]->set_text('Twig is easy') } } )->parse(<DATA>)->print; __DATA__ <?xml version="1.0" encoding="iso-8859-1"?> <root><msg>As a last resort XML::LibXML is also useful.</msg></root>
        Boris
Re: Replacing things in XML files
by davido (Cardinal) on Apr 26, 2006 at 20:26 UTC

    Yes, XML is 'human readable' (ie, plain old text), but I don't think it's a very robust solution to make changes to an XML document simply with a s/// operation skimming its way through the file. You have to weigh the risks of messing it up by under-engineering the solution against the difficulty and value of developing a properly engineered solution.

    The good news is that there are some excellent XML parsing tools that virtually eliminate the "difficulty" of implementing a well-engineered solution. XML::Simple is one, but I prefer XML::Twig, because it handles some peculiar XML cases that make 'Simple choke, and because it still provides a very simple interface, similar to that of XML::Simple. With XML::Twig you can, in very few easy lines of code, grab the entire XML document (or portions thereof) and have it automagically parsed into a Perl datastructure. Then after modifying the specific elements within that datastructure that need to be changed, one or two more lines of code will dump that datastructure back into a properly formed XML document.

    Because the XML::Twig approach is so easy to use, and because it represents a "properly engineered" solution, there is simply virtually no reason not to favor it over a less robust "regexp" solution.

    The only problem I see is with regard to keeping indents and line breaks from the original file in the same exact places. XML::Twig will output clean XML, but it might not follow the existing formatting 100%. ...nor should it matter; XML, while human readable, is ultimately there for the machines to use. You can guarantee that an XML document output by XML::Twig will retain just as much human readability as the original, but you cannot guarantee that it will look 100% identical.


    Dave

Re: Replacing things in XML files
by osunderdog (Deacon) on Apr 26, 2006 at 19:18 UTC
    XML::Simple

    Just for the record, it is simple for simple XML things, however the parameters and options available in this package have complex implications on how the XML information is read in and written out.

    Hazah! I'm Employed!

      It looks like XML::Simple loses the distinction between attributes and child elements, when building the returned data structure. How do you preserve the actual content when writing it back out?

        This is definitely one of the difficulties I've had with XML::Simple. XMLin and XMLout aren't the inverse of each other. In other words, it's difficult write out the same thing that you read in and vice versa.

        I if you need that, you should probably look at XML::Twig.

        Hazah! I'm Employed!

Re: Replacing things in XML files
by jhourcle (Prior) on Apr 27, 2006 at 19:06 UTC

    I'm not sure about keeping the actual formating (indents, line breaks, comments), but what you're basically looking for is XSLT.

    There exists in CPAN a few different XSLT modules (XML::XSLT, XML::LibXSLT, XML::ApplyXSLT), but I've never used any of them.

    (I do enough XML work that I know of XSLT, yet never used it directly)

Re: Replacing things in XML files
by grantm (Parson) on Apr 28, 2006 at 21:10 UTC

    Given your desire to preserve indentation, comments etc, XML::Simple is definitely not what you want. I'd recommend XML-LibXML. This node compares the code required to complete various tasks using XML::Simple and XML::LibXML.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://545849]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (7)
As of 2024-04-19 10:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found