Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

OT?: Partition the tables? (Re: DBIx::Class with two kinds of otherwise identical table types)

by roboticus (Chancellor)
on Sep 15, 2010 at 04:49 UTC ( [id://860152]=note: print w/replies, xml ) Need Help??


in reply to DBIx::Class with two kinds of otherwise identical table types

dwm042:

I can't answer your DBIx::Class question, as I've never used it. But I'm wondering why you want to split out your residents tables? It seems to me that it would simply make it harder to do useful queries. You could group your residents by putting an index on the town column for residents if you're trying to speed up access to groups of residents of a particular town.

If you're using a database that permits the use of partitioned tables, then you might consider partitioning your residents table. That way, typical queries could use the residents table and see all residents for all towns. But you could access the individual partitions of the table, if need be. In MS SQL Server, you have to build a partitioning function (to tell it which table a particular resident goes to) and it has other criteria (key monotonicity, etc.). I think you could do it with a key of (Town,ResidentID) or some such. I've not built a partitioned table in a while, so I'd have to hit the books again to be sure. But seeing as I'm going to bed (it's nearly 1:00AM here), I think I'll just cough up a couple links Google gave me, and you can review 'em to see if it looks interesting: Partitioned Tables and Indexes in SQL Server 2005 and SQL SERVER – 2005 – Database Table Partitioning Tutorial.

...roboticus

  • Comment on OT?: Partition the tables? (Re: DBIx::Class with two kinds of otherwise identical table types)

Replies are listed 'Best First'.
Re: OT?: Partition the tables? (Re: DBIx::Class with two kinds of otherwise identical table types)
by dwm042 (Priest) on Sep 15, 2010 at 11:55 UTC

    Roboticus,

    Right now partitioning is a hypothetical, a way to handle a few huge towns in a mass of small towns. But since you've given me a name to the method, I can do searches and I found this url, which though elusive, gives me a couple new pointers in Yann's comment.

    In other cases, a Dr. Chris Cole was using partitioned tables, because of the sheer size of his data sets (~90M rows/month). Link is here.

    Thanks.

    David.

      dwm042:

      The reason I used partitioned tables is because I worked at a financial institution that deals with a *huge* number of transactions each day. We have to keep different levels of transaction information for different amounts of time. So we used partitioning to help manage the volume of data. A brief description follows, to give an illustration of how and why to use partitioned tables:

      Requirements:

      • We need complete transaction details for 35 days. (Actual req was for 30 days, we kept five additional days for simplifying the monthly summary and leave a bit of elbow room for error recovery.)
      • We need transaction summaries for 18 months
      • Database needs to be online 24/7
      • We had several indexes on the data that take a good amount of time to rebuild
      • ...others not mentioned here...

      Because of these requirements, we had two tables TxnDtls and TxnSumHist for the details and summaries. We partitioned both tables based on the date. For TxnDtls, we used the day (YYYYMMDD), and for TxnSumHist, we used the month (YYYYMM). (In the remainder of the post, think of YYYYMM and YYYYMMDD as stand-ins for the actual dates.) Our process consisted of roughly:

      1. Create table TxnDtls_YYYYMMDD
        select top 0 * into TxnDtls_YYYYMMDD from TxnDtls
      2. Bulk load the transaction details into the table (using BCP)
      3. Build the indexes.
      4. Update the partition function to eliminate the 35-day-old TxnDtls_YYYYMMDD table and add the new table.
      5. If it's the first day of the month, then:
        1. Create the new TxnSumHist_YYYYMM table, summarizing the data
          select -- Key fields D.Merchant_ID, 'YYYYMM' as Billing_Period, D.TxnType, ... -- Statistics fields sum(D.Amount) as TxnTotal, count(*) as TxnCount, ... into TxnSumHist_YYYYMM S from TxnDtls D where D.TxnDate between ... and ... group by D.Merchant_ID, D.TxnType, ...
        2. Build the indexes
        3. Update the partition function to eliminate the 19-month-old TxnSumHist_YYYYMM table and add the new one.
        4. Drop the old TxnSumHist_YYYYMM table
      6. Drop the old TxnDtls_YYYYMMDD table

      By carefully distributing the table and index partitions among your storage systems, you can get surprisingly good performance. (Assuming you have the I/O capacity and enough storage devices to distribute the load to. Towards the end of the project, we used a couple fiber optic cards to connect to a massive storage system that distributed 1.6TB of data over numerous 20GB drives. The performance was stunning!)

      If anyone has any specific questions about the system, just ask, and I'll answer as best as I can. But the system was decommissioned about six months ago, and I work at a different company now, so some of the finer details are still leaking away from my memory... ;^)

      ...roboticus

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://860152]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-19 17:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found