http://www.perlmonks.org?node_id=685016

mattr has asked for the wisdom of the Perl Monks concerning the following question:

Cherished Monks,

I have an XML file describing hundreds of companies that I'll pull down by ftp, and the provider may add new tags. Provider says to be sure to filter out new tags so your app doesn't crash. Okay, so I need to pull each company subtree off the feed, and I suppose for each company then I need to filter it in some way before putting the data in a database.

I don't have tons of memory so it seems I'd use XML::Twig or a SAX module to pull a singl company off the file at a time. Then it would be nice if I could just filter a company with a single command that I hand a brief template, and then validate to a DTD perhaps before putting the data in the database.

To filter it should I use SAX or maybe XSLT / XPathScript? Maybe I could validate to a DTD and just skip errors? Would Pyx be easiest? I can't tell if these tools can do what I want and documentation doesn't really cover my task.

I'd like to just simply list the tags I am expecting and have everything else filtered out. Conceptually I thought I ought to be able to specify a template xml file and use that as a filter. I'd like to not have to write tons of filter code as it seems like a common enough task, and undoubtedly libXML can do it but I can't find enough docs about that either.

Then I need to dump the company twig into a DBIx::Class based database that has tables and foreign keys set up as expected from the XML. Again, I wish there was a quick way to do it.

Thanks for your meditation on my task!

Matt R