Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Dear Monks,

I would like some advice on loading a daily downloaded file of XML data into a database in a simple but robust manner. FWIW it is Mergentís Standardized Data Feed called Mergent Global Company Data, which includes for each company its history, summaries of its quarterly financial statements, lists of officers, text sections, etc. and itís a bit complex. For example the feed will change over time. It can have different sections indicating different classes of executives it lists, etc. Text sections are as <![CDATA[ Öhtml hereÖ ]]> too. I can get the full file or just the new bits delivered, by FTP. (No, this is not RSS here.)

I am planning on using this in a Catalyst app which normally would be using DBIx::Class, and will also have some manually entered data for fields not in the feed. Does anyone have experience with this kind of database updated by an XML data feed? Iíve skimmed some likely sounding modules in CPAN like DBIx::XML::DataLoader (in beta for 6 years now), DBIx::DBStag and its cookbook, etc.

The last guy who tried to write a schema for the feed quit the job, and I want to be a good lazy japh so now Iím even thinking about some way to just load the XML once on startup and search that. Otherwise, I need to rebuild the database from the whole feed automatically.

Use of the data will involve simple display of a companyís data and also printing that to a report.. something that can be edited and sent to PDF like OpenOffice or maybe LyX.

Mainly I want to keep it simple and not require lots of maintenance, so rather than making and maintaining huge db schemas and creating DBIx relations (a company has many statements, a section has many officers, etc.) I wonder if a simpler answer is possible. Otherwise I could just code the existing data and then merge in new data daily that the model understands.

Thanks for your help.

Matt R.

UPDATED: Looks like maybe using XML::Parser to build my table is the answe. Also found DBIx::XML::DataLoader, has anyone used it, is it being maintained? The docs are a little opaque to me.. but not as scary, I think, as XML::RDB.

In reply to Building a database from XML data feed by mattr

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (4)
    As of 2020-01-26 16:21 GMT
    Find Nodes?
      Voting Booth?