Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Replacing XML content

by doran (Deacon)
on Aug 18, 2000 at 01:44 UTC ( #28388=perlquestion: print w/replies, xml ) Need Help??

doran has asked for the wisdom of the Perl Monks concerning the following question:

I have some XML files stored. Occasionally, I'll receive "update" XML files from other machines. I need to replace the content of existing nodes (and their children) with the incoming update. I don't need to keep any existing data in the replaced nodes, but I do need to keep the data in any other nodes. Also, I need to keep the order of the tags in the updated file the same as in the original file.

Here's an example of the results I'm looking for. In it, I want to update everything in the <jobs> node and below.

Given this "old" XML:

<root> <info> <name>Joe Bleugh</name> <badge_id>1234</badge_id> <phone_ext>987</phone_ext> </info> <jobs> <chore> <desc>Wash Dishes</desc> <due>8-17-2000</due> </chore> <chore> <desc>Paint House</desc> <due>9-20-2000</due> </chore> </jobs> </root>
And this Update:
<root> <info> <badge_id>1234</badge_id> </info> <jobs> <chore> <desc>Mow Lawn</desc> <due>8-20-2000</due> </chore> <chore> <desc>Buy Car</desc> <due>10-1-2000</due> </chore> </jobs> </root>
I want to produce:
<root> <info> <name>Joe Bleugh</name> <badge_id>1234</badge_id> <phone_ext>987</phone_ext> </info> <jobs> <chore> <desc>Mow Lawn</desc> <due>8-20-2000</due> </chore> <chore> <desc>Buy Car</desc> <due>10-1-2000</due> </chore> </jobs> </root>
Where none of the data outside the <jobs> node gets altered, but everything within it does get replaced.

So my question isn't whether it can be done (of course it can) but rather about what's the best, easiest way of doing it. XML::Simple doesn't seem quite robust enough, whereas XML::Parser doesn't make it too easy. XML::Twig looks like it can do this, but my eyes started to cross after a couple of pages worth of docs. Before I went too much further reading anything at CPAN with the letters XML attached, I thought I'd ask here and see what y'all thought. Is there a module that's particularly well suited for this or do I just need to sit down with a Big Brew and write a bunch of subroutines for XML:Parser to go through?

Thanks, of course, for any insights.
db

Replies are listed 'Best First'.
RE: Replacing XML content
by Mushy (Scribe) on Aug 18, 2000 at 03:10 UTC
    Sounds like a job for DOM (XML::DOM) style of parsing rather than SAX model which XML::Parser and XML::Twig use. Basically have in memory trees for both trees, then iterate through the new tree nodes and replace node in first with the new one and finally dumping it out.
Re: Replacing XML content
by mirod (Canon) on Aug 30, 2000 at 21:12 UTC

    OK, it's really late for an answer here, but here is the XML::Twig script that does it. It's a little simpler than the regexp one and certainly a lot more robust.

    #!/bin/perl -w
    use strict;
    use XML::Twig;
    
    my( $main_file, $upd_file)= @ARGV;
    
    # get the info we need by loading the update file
    my $t_upd= new XML::Twig();
    $t_upd->parsefile( $upd_file);
    
    my $upd_badge_id = $t_upd->root->next_elt( 'badge_id')->text;
    my $upd_chore    = $t_upd->root->next_elt( 'jobs');
    
    # now process the main file
    my $t= new XML::Twig( TwigHandlers => { jobs => \&jobs, },
    		      PrettyPrint => 'indented',
    		    );
    $t->parsefile( $main_file);
    $t->flush;           # don't forget or the last closing tags won't be printed
    
    sub jobs
      { my( $t, $jobs)= @_;
        # just replace jobs if the previous badge_id is the right one
        if( $jobs->prev_elt( 'badge_id')->text eq $upd_badge_id)
          { $upd_chore->replace( $jobs); }
        $t->flush;    # print and flush memory so only one job is in there at once
      }
    
Re: Replacing XML content
by doran (Deacon) on Aug 18, 2000 at 03:12 UTC
    Okay, after posting the question above, I asked myself "Self, what about using a regex instead of some new-fangled module?"

    So I came up with the following:

    #!/usr/bin/perl -w use strict; my $oldfile ='./existing.xml'; my $updatefile ='./update.xml'; my $newfile ='./new.xml'; my ($old,$update); { # read in the two files local $/; open OLD, $oldfile or die "$!"; $old=<OLD>; close OLD; open UP, $updatefile or die "$!"; $update=<UP>; close UP; } $update =~ s/^.*(\<dbf>.*\<\/dbf>).*$/$1/sgi; my $up=$1; $old =~ s/\<dbf>.*\<\/dbf>/$up/sgi; open NEW, ">$newfile" or die "$!"; print NEW "$old"; close NEW; exit();
    Which seems to work fine for what I'm doing (replacing everything between nodes).

    Let me know if I did anything stupid here. Otherwise thanks for thinking good thoughts.

    db

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://28388]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2019-12-06 15:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (156 votes). Check out past polls.

    Notices?