onegative has asked for the wisdom of the Perl Monks concerning the following question:
I am having trouble with XML::DifferenceMarkup module. I have been using this module due to its speed but now I have discovered a problem with the make_diff method and I am usure if its due to the XML's format or truly a bug within the module. The problem is that a completely new element does not show up as an dm:insert but rather as an update as well as defining child elements of dm:delete and dm:insert where no true difference exists
Basically I receive an XML file on a daily basis from each server and then store it into a relational database, upon the next received XML for that server I perform a make_diff against the old_xml and new_xml which results in my insert/update/delete xml that I use to CRUD my database. I use this module as its extremely fast compared to all the other xml difference modules I have investigated. But I guess speed isn't beneficial if the difference results don't function correctly.
I have a simple snippet of code to perform the make_diff as follows:
use XML::DifferenceMarkup qw(make_diff); use XML::LibXML; my $old_xml = $ARGV; my $new_xml = $ARGV; my $parser = XML::LibXML->new(); $parser->keep_blanks(0); my $d1 = $parser->parse_file($old_xml); my $d2 = $parser->parse_file($new_xml); my $dom = make_diff($d1, $d2); print $dom->toString(1);
I have the sample xml (old and new) that I experience the issue with and can email the xml if someone is interested in helping me try and figure out what the issue is. I can also provide specifics with regard to particular element that I have identified as a problem.Please let me know and I will send the files and specifics asap. Thanks, Danny
So I have had a startling revelation. I now see why I am getting the results I see. This module performs its comparison element by element so that when a new element is defined somewhere in the document that is not in its numeric end location (element 1, element 2, element new) versus (element 1, element new, element 2) the module assumes the element has been modified and thereafter prints the results showing the new element as an update to the old element and thereafter each element is shown to be an update cascading down the tree where the previous last element is now showed to be an insert.
Because the source data builds the xml based off alphabetical element name this is uncontrollable from my processing perspective. How to resolve? Back to the drawing board.
|Replies are listed 'Best First'.|