Re^2: Strategy for managing a very large database with Perl

moritz> To me, partitioning by year sounds like a good idea.

Thanks. I might try that. That would result in 23 tables, each with 4_850_174_538 rows by however many columns. That is still almost 5 billion rows, but better than 115 billion.

Now, here is where some db chops are needed. Is a db really inefficient managing 115 billion rows as opposed to 5 billion? Well, yes, for certain operations it can be so. But, as far as I understand, a db manages data in pages, so, it is analogous to reading data from file line by line, except, each line, in this case, is a page. The db knows which page to go to with the help of indexes. So, it doesn't matter how big the table is. The db goes to the right page to grab the data. In this way, it is analogous to knowing the correct file offset to go to the right place in the file to read the data. Sure, I could implement all that crap myself, but the db has already figured it out.

Since my data are pretty much read-only, what do I gain by tinkering with the rather simple scheme I have right now?

I can optimize for space, but 10 TB doesn't sound very big, especially when some of my colleagues are deploying 100s of TB for satellite imagery.

I can optimize for rows, and while 115 billion is indeed a very large number, does it affect my operation?

In the end, I want to optimize for ease and speed of retrieval, so that is where I want to concentrate on. The partitioning docs indicate that that cause might be helped by virtue of having smaller INDEXes for specific searches.

Thanks for the suggestion.

--

when small people start casting long shadows, it is time to go to bed

Comment on Re^2: Strategy for managing a very large database with Perl


Do you know where your variables are?
	PerlMonks