http://www.perlmonks.org?node_id=662050

mattr has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I would like some advice on loading a daily downloaded file of XML data into a database in a simple but robust manner. FWIW it is Mergent’s Standardized Data Feed called Mergent Global Company Data, which includes for each company its history, summaries of its quarterly financial statements, lists of officers, text sections, etc. and it’s a bit complex. For example the feed will change over time. It can have different sections indicating different classes of executives it lists, etc. Text sections are as <![CDATA[ …html here… ]]> too. I can get the full file or just the new bits delivered, by FTP. (No, this is not RSS here.)

I am planning on using this in a Catalyst app which normally would be using DBIx::Class, and will also have some manually entered data for fields not in the feed. Does anyone have experience with this kind of database updated by an XML data feed? I’ve skimmed some likely sounding modules in CPAN like DBIx::XML::DataLoader (in beta for 6 years now), DBIx::DBStag and its cookbook, etc.

The last guy who tried to write a schema for the feed quit the job, and I want to be a good lazy japh so now I’m even thinking about some way to just load the XML once on startup and search that. Otherwise, I need to rebuild the database from the whole feed automatically.

Use of the data will involve simple display of a company’s data and also printing that to a report.. something that can be edited and sent to PDF like OpenOffice or maybe LyX.

Mainly I want to keep it simple and not require lots of maintenance, so rather than making and maintaining huge db schemas and creating DBIx relations (a company has many statements, a section has many officers, etc.) I wonder if a simpler answer is possible. Otherwise I could just code the existing data and then merge in new data daily that the model understands.

Thanks for your help.

Matt R.

UPDATED: Looks like maybe using XML::Parser to build my table is the answe. Also found DBIx::XML::DataLoader, has anyone used it, is it being maintained? The docs are a little opaque to me.. but not as scary, I think, as XML::RDB.