![]() |
|
Your skill will accomplish what the force of many cannot |
|
PerlMonks |
Re: Multiple XML files from Directory to One XML file using perl.by graff (Chancellor) |
on Nov 19, 2011 at 03:22 UTC ( #938934=note: print w/replies, xml ) | Need Help?? |
I suppose that if you were to make up a tag name to use as the one single container for all your existing xml files, it would be a pretty simple matter, and probably wouldn't even involve xml parsing at all. You just need to make sure that the new tag name that you make up does not already occur as a tag in any of the existing xml files.
It's good that you already solved the part about finding all the files -- I'll use the OP code as a starting point (thanks for that), and reduce it down to just the essentials: The point is that, since each input xml file is a fully self-contained element, and you probably don't want to disrupt that structure, all you need is to create a novel tag that won't get confused with any existing content, and use that as the one element that will contain everything else being put into the new file. Just drop the initial <?xml...?> line from each input file. (I've seen a lot of "xml" files that don't start with that, so I think it's worthwhile to check.) Other things I changed in the code were:
If your duplication problem is really just a matter of the (exact) same xml content showing up in multiple files (e.g. "foo1.xml" is a copy of "foo2.xml", or "blah1/foo.xml" is a copy of "blah2/foo.xml"), you can simply get md5 signatures of all the files first, sort by md5 values, and look for duplicates that way (files with identical content will have identical md5 values). But if the duplication problem involves elements that make up parts of files, then a parser is the only way to go, and you'll need to know enough about the data to figure out which elements need to be checked for duplicate content. If you know which tags to look at, running a parser on the "all-combined" xml will make it easy to find and remove the duplicates.
In Section
Seekers of Perl Wisdom
|
|