Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Building a database from XML data feed

by mattr (Curate)
on Jan 13, 2008 at 09:22 UTC ( #662171=note: print w/ replies, xml ) Need Help??


in reply to Re: Building a database from XML data feed
in thread Building a database from XML data feed

Yes I'm throwing out the old work. It requires building a new table for every company every year, is undocumented, ...I'll stop there.

The spec is understandable. There is the question of building a facility to merge specific partial feeds together manually, or just rebuild the whole thing daily from a full feed. A rollback and maybe a way to lock fields from being updated.

I've also been pondering a model built around the full XML feed loaded right into memory at server startup, which might make it more robust and configurable.. Also yes they say the schema will change but not how, I figure the most important parts won't change but would like to make it configurable by the admin so I do not have to support it forever. Certainly an update will be issued when an executive of a company is hired or retires, also new types of officers could be added, etc.

So yes I can see a way to model the largest features of the XML structure in DBIx but am intrigued by the possibility of not greatly minimizing that. Somewhere though I'll have to do some degree of linking feed data to manually entered data, or importing them into the same database. It can all just be string data. Maybe json and yml could be useful.

The data looks like this. Probably thousands of companies, here's just one. I think storing 500 companies is more what we need to do for now though.

<?xml version="1.0" encoding="ISO-8859-1"?> <Feed ExtractDate="08/08/2006" ExtractTime="11:30:41"> <ENTITY EntityReference="0000127509" LegalName="21st Century Holding C +o." Status="A"> <COMPANY> <Identity> <OfficialName>21st Century Holding Co.</OfficialName> <ShortName>21st Century Holding Co.</ShortName> <Status>Active</Status> <CountryCode>USA</CountryCode> <Region>South Atlantic</Region> <CompNumber>00096995</CompNumber> <CIK>0001069996</CIK> <MergentIndustryCode>8.2</MergentIndustryCode> <CommonTicker>TCHC</CommonTicker> <CommonExchange>NMS</CommonExchange> <CommonCusip>90136Q100</CommonCusip> <Street1>4161 N.W. 5th Street</Street1> <City>Plantation</City> <State>FL</State> <Country>USA</Country> <Zipcode>33317</Zipcode> <PhoneNumber>954 581 9993</PhoneNumber> <Email>fedinfo@fedusa.com</Email> <WebSite>www.fedfirst.com</WebSite> <FYE>12/31/2005</FYE> </Identity> <BusinessActivities> <SIC Primary="6331" Secondary="6719"/> <NAIC Primary="524126" Secondary="551112"/> <TextSection Title="Business Summary" Date="06/01/2006"> <![CDATA[ <p>21st Century Holding is an insurance holding company, which, throug +h its subsidiaries, controls the insurance underwriting, distribution + and claims process. Co. underwrites personal automobile insurance an +d homeowners and mobile home property and casualty insurance in the S +tate of Florida through its subsidiary, Federated National Insurance +Company. Co. has underwriting authority for third-party insurance com +panies which it represents through a managing general agent. Co. also + offers financing to its own and third-party insureds through its sub +sidiary, Federated Premium Finance, Inc., and pays advances through F +ed First Corp.</p> ]]> </TextSection> </BusinessActivities> <Executives> <Section Title="Officers"> <Executive FirstName="Edward" MiddleName="J." LastName="Lawson" + Title="Chmn., Pres."/> <Executive FirstName="Richard" MiddleName="A." LastName="Widdic +ombe" Title="C.E.O."/> <Executive FirstName="Michele" MiddleName="V." LastName="Lawson +" Title="V.P., Agency Oper., Treas."/> <Executive FirstName="James" MiddleName="G." LastName="Jennings +" Suffix="III" Title="C.F.O."/> <Executive FirstName="Keith" MiddleName="M." LastName="Linder" +Title="C.O.O."/> <Executive FirstName="James" MiddleName="A." LastName="Epstein" + Title="Sec."/> </Section> <Section Title="Directors"> <Executive FirstName="Edward" MiddleName="J." LastName="Lawson" + Title="Chmn."/> <Executive FirstName="Carl" MiddleName="" LastName="Dorf"/> <Executive FirstName="Bruce" MiddleName="" LastName="Simberg"/> <Executive FirstName="Charles" MiddleName="B." LastName="Hart" +Suffix="Jr."/> <Executive FirstName="Richard" MiddleName="W." LastName="Wilcox +" Suffix="Jr."/> <Executive FirstName="Peter" MiddleName="" LastName="Prygelski" +/> </Section> </Executives> <FinData_Generated> <Report> <ReportDate>03/31/2006</ReportDate> <ReportType>Q1</ReportType> <Auditor>U</Auditor> <Currency>USA</Currency> <Consolidated>True</Consolidated> <fi Mapcode="-402" Amount="23001737"/> <fi Mapcode="-384" Amount="0.83"/> <fi Mapcode="-379" Amount="53213270"/> <fi Mapcode="-365" Amount="8599042"/> <fi Mapcode="-364" Amount="40167125"/> <fi Mapcode="-356" Amount="227079885"/> <fi Mapcode="-344" Amount="93988871"/> <fi Mapcode="-337" Amount="28367811"/> <fi Mapcode="-333" Amount="6013312"/> <fi Mapcode="-310" Amount="25114709"/> <fi Mapcode="-249" Amount="36.8577792400461"/> </Report> ... 20 more reports here ... <ReportDate>03/31/2002</ReportDate> <ReportType>Q1</ReportType> <Auditor>U</Auditor> <Currency>USA</Currency> <Consolidated>True</Consolidated> <fi Mapcode="-402" Amount="6086503"/> <fi Mapcode="-384" Amount="0.22"/> <fi Mapcode="-379" Amount="14592615"/> <fi Mapcode="-365" Amount="6165671"/> <fi Mapcode="-364" Amount="5822488"/> <fi Mapcode="-356" Amount="59264371"/> <fi Mapcode="-344" Amount="17710206"/> <fi Mapcode="-337" Amount="549056"/> <fi Mapcode="-333" Amount="991370"/> <fi Mapcode="-310" Amount="9507000"/> <fi Mapcode="-249" Amount="16.2431471547281"/> </Report> </FinData_Generated> <Miscellaneous> <Employee Description="AppoximateFullTime" Count="135" AsOf="12/ +31/2005"/> <Shareholders Count="3000" AsOf="03/29/2006"/> <ShareHolderRelations Name="Becky Campillo" PhoneNumber="954-581 +-9993 x1257"/> <Incorporation Country="USA" State="FL" Month="3" Year="1991"/> <Provider ServiceType="Auditor" Name="McKean, Paul, Chrycy, Flet +cher &amp; Co."/> <Provider ServiceType="Counsel" Name="Broad &amp; Cassel"/> </Miscellaneous> <StockSummary> <StockIssue Type="Common" Description="common"> <StockOutstanding Amount="6048842.00" Units="SHR" Date="12/31/ +2004"/> <Par Amount="0.01" Units="USA"/> <Authorized Amount="37500000.00" Units="SHR" Unlimited="No"/> <Treasury Amount="696849.00" Units="SHR"/> <StockIdentity Ticker="TCHC" Exchange="Nasdaq National Market" +/> <TextSection Title="Stock Splits" Date="06/01/2006"> <![CDATA[ <p><font color="black">$0.01 par shares split in the form of a 50% sto +ck dividend on Sept. 7, 2004.</font></p> ]]> </TextSection> <TextSection Title="Ownership" Date="06/01/2006"> <![CDATA[ <p><font color="black">As of April 15, 2005, Edward J. Lawson and all +directors and executive officers as a group held 25.1% and 33.1%, res +pectively of Co.'s outstanding common stock.</font></p> ]]> </TextSection> <TextSection Title="Voting Rights" Date="06/01/2006"> <![CDATA[ <p><font color="black">Entitled to one vote per share.</font></p> ]]> </TextSection> <TextSection Title="Dividends Paid" Date="06/01/2006"> <![CDATA[ <table border="1"> <tr> <td> <p><font color="teal"><two +column>2001</twocolumn></font></p> </td> <td> <p><fo +nt color="teal"><twocolumn>0.08</twocolumn></font></p> </td> + <td> <p><font color="teal"><twocolumn>2002</twocolumn></font +></p> </td> <td> <p><font color="teal"><twocolumn>0. +11</twocolumn></font></p> </td> <td> <p><font color= +"teal"><twocolumn>2003</twocolumn></font></p> </td> <td> + <p><font color="teal"><twocolumn>0.32</twocolumn></font></p> + </td> </tr> </table><p/> <p><font color="red"><footnote>&#6540 +7;</footnote></font><font color="black">Adjusted for 3-for-2 split:</ +font></p> <table border="1"> <tr> <td> <p><font color +="teal"><twocolumn>2004</twocolumn></font></p> </td> <td> + <p><font color="teal"><twocolumn>0.32</twocolumn></font></p> + </td> <td> <p><font color="teal"><twocolumn>[1]2005</t +wocolumn></font></p> </td> <td> <p><font color="teal +"><twocolumn>0.32</twocolumn></font></p> </td> <td>&#65407; +</td> <td>&#65407;</td> </tr> </table><p/> <p><font color=" +red"><footnote>[1]To Dec. 1</footnote></font></p> ]]> </TextSection> <TextSection Title="Options" Date="06/01/2006"> <![CDATA[ <p><font color="black">Dec. 31, 2004, authorized for issuance, 3,688,5 +00 shares; options outstanding, 1,119,575 shares. </font></p> ]]> </TextSection> <TextSection Title="Transfer Agent &amp; Registrar" Date="06/0 +1/2006"> <![CDATA[ <p><font color="black">Global Securities Transfer, Inc., Denver, CO</f +ont></p> ]]> </TextSection> <TextSection Title="Price Range" Date="06/01/2006"> <![CDATA[ <table border="1"> <tr> <td>&#65407;</td> <td> <p> +<font color="green"><pricerange>2004</pricerange></font></p> </t +d> <td> <p><font color="green"><pricerange>2003</priceran +ge></font></p> </td> <td> <p><font color="green"><pr +icerange>2002</pricerange></font></p> </td> <td> <p> +<font color="green"><pricerange>2001</pricerange></font></p> </t +d> <td> <p><font color="green"><pricerange>2000</priceran +ge></font></p> </td> <td> <p><font color="green"><pr +icerange>1999</pricerange></font></p> </td> <td> <p> +<font color="green"><pricerange>1998</pricerange></font></p> </t +d> </tr> <tr> <td> <p><font color="green"><priceran +ge>High</pricerange></font></p> </td> <td> <p><font +color="green"><pricerange>24.50</pricerange></font></p> </td> + <td> <p><font color="green"><pricerange>23.59</pricerange>< +/font></p> </td> <td> <p><font color="green"><pricer +ange>13.75</pricerange></font></p> </td> <td> <p><fo +nt color="green"><pricerange>3.88</pricerange></font></p> </td> + <td> <p><font color="green"><pricerange>7 15/16</priceran +ge></font></p> </td> <td> <p><font color="green"><pr +icerange>7 3/4</pricerange></font></p> </td> <td> <p +><font color="green"><pricerange>8 1/4</pricerange></font></p> < +/td> </tr> <tr> <td> <p><font color="green"><pricer +ange>Low</pricerange></font></p> </td> <td> <p><font + color="green"><pricerange>9.17</pricerange></font></p> </td> + <td> <p><font color="green"><pricerange>9</pricerange></fon +t></p> </td> <td> <p><font color="green"><pricerange +>3</pricerange></font></p> </td> <td> <p><font color +="green"><pricerange>0.98</pricerange></font></p> </td> <td +> <p><font color="green"><pricerange>2 7/16</pricerange></font +></p> </td> <td> <p><font color="green"><pricerange> +2 7/8</pricerange></font></p> </td> <td> <p><font co +lor="green"><pricerange>5 3/4</pricerange></font></p> </td> < +/tr> </table><p/> ]]> </TextSection> <TextSection Title="Offered" Date="06/01/2006"> <![CDATA[ <p><font color="black">(1,250,000 shares) at $7.50 per share (proceeds + to Co., $6.90 per share) on Nov. 10, 1998 through Gilford Securities + Incorporated; and associates. Offering contained over-allotment opt +ions to cover 187,500 shares. Proceeds used for contribution to Fede +rated National's capital to increase its underwriting capacity, repay +ment of a portion of the outstanding balance under Co.'s revolving li +ne of credit agreement, financing of acquisitions and working capital + and general corporate purposes.</font></p> ]]> </TextSection> </StockIssue> </StockSummary> </COMPANY> </ENTITY> ... more entities here ... </Feed>


Comment on Re^2: Building a database from XML data feed
Download Code
Re^3: Building a database from XML data feed
by mattr (Curate) on Jan 14, 2008 at 05:03 UTC
    Replying to myself here.. I just found that I'll have to allow companies to be added manually, not just from the feed. So I will have to use a database it seems. Also will have probably 1000 companies and maybe grow up to 10,000 over some years. Thanks for your help.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://662171]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2014-04-20 10:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls