http://www.perlmonks.org?node_id=860217


in reply to Re: OT?: Partition the tables? (Re: DBIx::Class with two kinds of otherwise identical table types)
in thread DBIx::Class with two kinds of otherwise identical table types

dwm042:

The reason I used partitioned tables is because I worked at a financial institution that deals with a *huge* number of transactions each day. We have to keep different levels of transaction information for different amounts of time. So we used partitioning to help manage the volume of data. A brief description follows, to give an illustration of how and why to use partitioned tables:

Requirements:

Because of these requirements, we had two tables TxnDtls and TxnSumHist for the details and summaries. We partitioned both tables based on the date. For TxnDtls, we used the day (YYYYMMDD), and for TxnSumHist, we used the month (YYYYMM). (In the remainder of the post, think of YYYYMM and YYYYMMDD as stand-ins for the actual dates.) Our process consisted of roughly:

  1. Create table TxnDtls_YYYYMMDD
    select top 0 * into TxnDtls_YYYYMMDD from TxnDtls
  2. Bulk load the transaction details into the table (using BCP)
  3. Build the indexes.
  4. Update the partition function to eliminate the 35-day-old TxnDtls_YYYYMMDD table and add the new table.
  5. If it's the first day of the month, then:
    1. Create the new TxnSumHist_YYYYMM table, summarizing the data
      select -- Key fields D.Merchant_ID, 'YYYYMM' as Billing_Period, D.TxnType, ... -- Statistics fields sum(D.Amount) as TxnTotal, count(*) as TxnCount, ... into TxnSumHist_YYYYMM S from TxnDtls D where D.TxnDate between ... and ... group by D.Merchant_ID, D.TxnType, ...
    2. Build the indexes
    3. Update the partition function to eliminate the 19-month-old TxnSumHist_YYYYMM table and add the new one.
    4. Drop the old TxnSumHist_YYYYMM table
  6. Drop the old TxnDtls_YYYYMMDD table

By carefully distributing the table and index partitions among your storage systems, you can get surprisingly good performance. (Assuming you have the I/O capacity and enough storage devices to distribute the load to. Towards the end of the project, we used a couple fiber optic cards to connect to a massive storage system that distributed 1.6TB of data over numerous 20GB drives. The performance was stunning!)

If anyone has any specific questions about the system, just ask, and I'll answer as best as I can. But the system was decommissioned about six months ago, and I work at a different company now, so some of the finer details are still leaking away from my memory... ;^)

...roboticus