Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Yes I'm throwing out the old work. It requires building a new table for every company every year, is undocumented, ...I'll stop there.

The spec is understandable. There is the question of building a facility to merge specific partial feeds together manually, or just rebuild the whole thing daily from a full feed. A rollback and maybe a way to lock fields from being updated.

I've also been pondering a model built around the full XML feed loaded right into memory at server startup, which might make it more robust and configurable.. Also yes they say the schema will change but not how, I figure the most important parts won't change but would like to make it configurable by the admin so I do not have to support it forever. Certainly an update will be issued when an executive of a company is hired or retires, also new types of officers could be added, etc.

So yes I can see a way to model the largest features of the XML structure in DBIx but am intrigued by the possibility of not greatly minimizing that. Somewhere though I'll have to do some degree of linking feed data to manually entered data, or importing them into the same database. It can all just be string data. Maybe json and yml could be useful.

The data looks like this. Probably thousands of companies, here's just one. I think storing 500 companies is more what we need to do for now though.

<?xml version="1.0" encoding="ISO-8859-1"?> <Feed ExtractDate="08/08/2006" ExtractTime="11:30:41"> <ENTITY EntityReference="0000127509" LegalName="21st Century Holding C +o." Status="A"> <COMPANY> <Identity> <OfficialName>21st Century Holding Co.</OfficialName> <ShortName>21st Century Holding Co.</ShortName> <Status>Active</Status> <CountryCode>USA</CountryCode> <Region>South Atlantic</Region> <CompNumber>00096995</CompNumber> <CIK>0001069996</CIK> <MergentIndustryCode>8.2</MergentIndustryCode> <CommonTicker>TCHC</CommonTicker> <CommonExchange>NMS</CommonExchange> <CommonCusip>90136Q100</CommonCusip> <Street1>4161 N.W. 5th Street</Street1> <City>Plantation</City> <State>FL</State> <Country>USA</Country> <Zipcode>33317</Zipcode> <PhoneNumber>954 581 9993</PhoneNumber> <Email>fedinfo@fedusa.com</Email> <WebSite>www.fedfirst.com</WebSite> <FYE>12/31/2005</FYE> </Identity> <BusinessActivities> <SIC Primary="6331" Secondary="6719"/> <NAIC Primary="524126" Secondary="551112"/> <TextSection Title="Business Summary" Date="06/01/2006"> <![CDATA[ <p>21st Century Holding is an insurance holding company, which, throug +h its subsidiaries, controls the insurance underwriting, distribution + and claims process. Co. underwrites personal automobile insurance an +d homeowners and mobile home property and casualty insurance in the S +tate of Florida through its subsidiary, Federated National Insurance +Company. Co. has underwriting authority for third-party insurance com +panies which it represents through a managing general agent. Co. also + offers financing to its own and third-party insureds through its sub +sidiary, Federated Premium Finance, Inc., and pays advances through F +ed First Corp.</p> ]]> </TextSection> </BusinessActivities> <Executives> <Section Title="Officers"> <Executive FirstName="Edward" MiddleName="J." LastName="Lawson" + Title="Chmn., Pres."/> <Executive FirstName="Richard" MiddleName="A." LastName="Widdic +ombe" Title="C.E.O."/> <Executive FirstName="Michele" MiddleName="V." LastName="Lawson +" Title="V.P., Agency Oper., Treas."/> <Executive FirstName="James" MiddleName="G." LastName="Jennings +" Suffix="III" Title="C.F.O."/> <Executive FirstName="Keith" MiddleName="M." LastName="Linder" +Title="C.O.O."/> <Executive FirstName="James" MiddleName="A." LastName="Epstein" + Title="Sec."/> </Section> <Section Title="Directors"> <Executive FirstName="Edward" MiddleName="J." LastName="Lawson" + Title="Chmn."/> <Executive FirstName="Carl" MiddleName="" LastName="Dorf"/> <Executive FirstName="Bruce" MiddleName="" LastName="Simberg"/> <Executive FirstName="Charles" MiddleName="B." LastName="Hart" +Suffix="Jr."/> <Executive FirstName="Richard" MiddleName="W." LastName="Wilcox +" Suffix="Jr."/> <Executive FirstName="Peter" MiddleName="" LastName="Prygelski" +/> </Section> </Executives> <FinData_Generated> <Report> <ReportDate>03/31/2006</ReportDate> <ReportType>Q1</ReportType> <Auditor>U</Auditor> <Currency>USA</Currency> <Consolidated>True</Consolidated> <fi Mapcode="-402" Amount="23001737"/> <fi Mapcode="-384" Amount="0.83"/> <fi Mapcode="-379" Amount="53213270"/> <fi Mapcode="-365" Amount="8599042"/> <fi Mapcode="-364" Amount="40167125"/> <fi Mapcode="-356" Amount="227079885"/> <fi Mapcode="-344" Amount="93988871"/> <fi Mapcode="-337" Amount="28367811"/> <fi Mapcode="-333" Amount="6013312"/> <fi Mapcode="-310" Amount="25114709"/> <fi Mapcode="-249" Amount="36.8577792400461"/> </Report> ... 20 more reports here ... <ReportDate>03/31/2002</ReportDate> <ReportType>Q1</ReportType> <Auditor>U</Auditor> <Currency>USA</Currency> <Consolidated>True</Consolidated> <fi Mapcode="-402" Amount="6086503"/> <fi Mapcode="-384" Amount="0.22"/> <fi Mapcode="-379" Amount="14592615"/> <fi Mapcode="-365" Amount="6165671"/> <fi Mapcode="-364" Amount="5822488"/> <fi Mapcode="-356" Amount="59264371"/> <fi Mapcode="-344" Amount="17710206"/> <fi Mapcode="-337" Amount="549056"/> <fi Mapcode="-333" Amount="991370"/> <fi Mapcode="-310" Amount="9507000"/> <fi Mapcode="-249" Amount="16.2431471547281"/> </Report> </FinData_Generated> <Miscellaneous> <Employee Description="AppoximateFullTime" Count="135" AsOf="12/ +31/2005"/> <Shareholders Count="3000" AsOf="03/29/2006"/> <ShareHolderRelations Name="Becky Campillo" PhoneNumber="954-581 +-9993 x1257"/> <Incorporation Country="USA" State="FL" Month="3" Year="1991"/> <Provider ServiceType="Auditor" Name="McKean, Paul, Chrycy, Flet +cher &amp; Co."/> <Provider ServiceType="Counsel" Name="Broad &amp; Cassel"/> </Miscellaneous> <StockSummary> <StockIssue Type="Common" Description="common"> <StockOutstanding Amount="6048842.00" Units="SHR" Date="12/31/ +2004"/> <Par Amount="0.01" Units="USA"/> <Authorized Amount="37500000.00" Units="SHR" Unlimited="No"/> <Treasury Amount="696849.00" Units="SHR"/> <StockIdentity Ticker="TCHC" Exchange="Nasdaq National Market" +/> <TextSection Title="Stock Splits" Date="06/01/2006"> <![CDATA[ <p><font color="black">$0.01 par shares split in the form of a 50% sto +ck dividend on Sept. 7, 2004.</font></p> ]]> </TextSection> <TextSection Title="Ownership" Date="06/01/2006"> <![CDATA[ <p><font color="black">As of April 15, 2005, Edward J. Lawson and all +directors and executive officers as a group held 25.1% and 33.1%, res +pectively of Co.'s outstanding common stock.</font></p> ]]> </TextSection> <TextSection Title="Voting Rights" Date="06/01/2006"> <![CDATA[ <p><font color="black">Entitled to one vote per share.</font></p> ]]> </TextSection> <TextSection Title="Dividends Paid" Date="06/01/2006"> <![CDATA[ <table border="1"> <tr> <td> <p><font color="teal"><two +column>2001</twocolumn></font></p> </td> <td> <p><fo +nt color="teal"><twocolumn>0.08</twocolumn></font></p> </td> + <td> <p><font color="teal"><twocolumn>2002</twocolumn></font +></p> </td> <td> <p><font color="teal"><twocolumn>0. +11</twocolumn></font></p> </td> <td> <p><font color= +"teal"><twocolumn>2003</twocolumn></font></p> </td> <td> + <p><font color="teal"><twocolumn>0.32</twocolumn></font></p> + </td> </tr> </table><p/> <p><font color="red"><footnote>&#6540 +7;</footnote></font><font color="black">Adjusted for 3-for-2 split:</ +font></p> <table border="1"> <tr> <td> <p><font color +="teal"><twocolumn>2004</twocolumn></font></p> </td> <td> + <p><font color="teal"><twocolumn>0.32</twocolumn></font></p> + </td> <td> <p><font color="teal"><twocolumn>[1]2005</t +wocolumn></font></p> </td> <td> <p><font color="teal +"><twocolumn>0.32</twocolumn></font></p> </td> <td>&#65407; +</td> <td>&#65407;</td> </tr> </table><p/> <p><font color=" +red"><footnote>[1]To Dec. 1</footnote></font></p> ]]> </TextSection> <TextSection Title="Options" Date="06/01/2006"> <![CDATA[ <p><font color="black">Dec. 31, 2004, authorized for issuance, 3,688,5 +00 shares; options outstanding, 1,119,575 shares. </font></p> ]]> </TextSection> <TextSection Title="Transfer Agent &amp; Registrar" Date="06/0 +1/2006"> <![CDATA[ <p><font color="black">Global Securities Transfer, Inc., Denver, CO</f +ont></p> ]]> </TextSection> <TextSection Title="Price Range" Date="06/01/2006"> <![CDATA[ <table border="1"> <tr> <td>&#65407;</td> <td> <p> +<font color="green"><pricerange>2004</pricerange></font></p> </t +d> <td> <p><font color="green"><pricerange>2003</priceran +ge></font></p> </td> <td> <p><font color="green"><pr +icerange>2002</pricerange></font></p> </td> <td> <p> +<font color="green"><pricerange>2001</pricerange></font></p> </t +d> <td> <p><font color="green"><pricerange>2000</priceran +ge></font></p> </td> <td> <p><font color="green"><pr +icerange>1999</pricerange></font></p> </td> <td> <p> +<font color="green"><pricerange>1998</pricerange></font></p> </t +d> </tr> <tr> <td> <p><font color="green"><priceran +ge>High</pricerange></font></p> </td> <td> <p><font +color="green"><pricerange>24.50</pricerange></font></p> </td> + <td> <p><font color="green"><pricerange>23.59</pricerange>< +/font></p> </td> <td> <p><font color="green"><pricer +ange>13.75</pricerange></font></p> </td> <td> <p><fo +nt color="green"><pricerange>3.88</pricerange></font></p> </td> + <td> <p><font color="green"><pricerange>7 15/16</priceran +ge></font></p> </td> <td> <p><font color="green"><pr +icerange>7 3/4</pricerange></font></p> </td> <td> <p +><font color="green"><pricerange>8 1/4</pricerange></font></p> < +/td> </tr> <tr> <td> <p><font color="green"><pricer +ange>Low</pricerange></font></p> </td> <td> <p><font + color="green"><pricerange>9.17</pricerange></font></p> </td> + <td> <p><font color="green"><pricerange>9</pricerange></fon +t></p> </td> <td> <p><font color="green"><pricerange +>3</pricerange></font></p> </td> <td> <p><font color +="green"><pricerange>0.98</pricerange></font></p> </td> <td +> <p><font color="green"><pricerange>2 7/16</pricerange></font +></p> </td> <td> <p><font color="green"><pricerange> +2 7/8</pricerange></font></p> </td> <td> <p><font co +lor="green"><pricerange>5 3/4</pricerange></font></p> </td> < +/tr> </table><p/> ]]> </TextSection> <TextSection Title="Offered" Date="06/01/2006"> <![CDATA[ <p><font color="black">(1,250,000 shares) at $7.50 per share (proceeds + to Co., $6.90 per share) on Nov. 10, 1998 through Gilford Securities + Incorporated; and associates. Offering contained over-allotment opt +ions to cover 187,500 shares. Proceeds used for contribution to Fede +rated National's capital to increase its underwriting capacity, repay +ment of a portion of the outstanding balance under Co.'s revolving li +ne of credit agreement, financing of acquisitions and working capital + and general corporate purposes.</font></p> ]]> </TextSection> </StockIssue> </StockSummary> </COMPANY> </ENTITY> ... more entities here ... </Feed>

In reply to Re^2: Building a database from XML data feed by mattr
in thread Building a database from XML data feed by mattr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others avoiding work at the Monastery: (6)
    As of 2014-12-28 14:57 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (182 votes), past polls