Re: Strategy for managing a very large database with Perl

in reply to Strategy for managing a very large database with Perl

A typical search would be to retrieve all the values of a <variable> for or for a particular year+yday combination.

If those are the typical search patterns, is a relational database the right hammer to reach for? Why not flat files? Appending to a flat file is a quick operation. If you keep one flat file per day, scanning for the subset of records for that day is trivial. If you have to do a full scan through all the data, one file per day is also easy, and allows the scan to be parallelized if you ever need to throw more machines at the problem.

10Tb over 20 years is 500Mb a year, or very roughly 1.5Mb a day. That's not an unreasonable size for a flat file.

Comment on Re: Strategy for managing a very large database with Perl

Replies are listed 'Best First'.
Re^2: Strategy for managing a very large database with Perl by punkish (Priest) on Jun 18, 2010 at 12:13 UTC
punkish>> A typical search would be to retrieve all the values punkish>> of for or <for a set of points> punkish>> <for a particular year+yday combination>. dws> If those are the typical search patterns, Those are typical search patterns. There are a few other search patterns as well, so I have to find a format that can accommodate all of them. dws> is a relational database the right hammer to reach for? Dunno. Hence this thread. dws> Why not flat files? Well, the data are already in some kind of flat files. Actually, they are in files, but not flat files. They are in NetCDF format. The format is very compact, and very suitable for storing array of arrays, however, it is not very amenable to spatial searches. Hence, my desire to put the data in a database. As shown in my original post, the db table is very simple, almost like a flat file (with a little overhead per "line" added by the db). However, it allows me to JOIN this simple table with a location table. I can do a spatial search on the location table, retrieve the value of points that fall within my search, and then retrieve the variables from this simple but mongo table. -- when small people start casting long shadows, it is time to go to bed	[reply]

Replies are listed 'Best First'.

Re^2: Strategy for managing a very large database with Perl
by punkish (Priest) on Jun 18, 2010 at 12:13 UTC

punkish>> A typical search would be to retrieve all the values 
punkish>> of  for  or <for a set of points> 
punkish>> <for a particular year+yday combination>.

dws> If those are the typical search patterns,

Those are typical search patterns. There are a few other search patterns as well, so I have to find a format that can accommodate *all* of them.

dws> is a relational database the right hammer to reach for?

Dunno. Hence this thread.

dws> Why not flat files?

Well, the data are already in some kind of flat files. Actually, they are in files, but not flat files. They are in NetCDF format. The format is very compact, and very suitable for storing array of arrays, however, it is not very amenable to spatial searches. Hence, my desire to put the data in a database.

As shown in my original post, the db table is very simple, almost like a flat file (with a little overhead per "line" added by the db). However, it allows me to JOIN this simple table with a location table. I can do a spatial search on the location table, retrieve the value of points that fall within my search, and then retrieve the variables from this simple but mongo table.

--

when small people start casting long shadows, it is time to go to bed

[reply]

In Section Seekers of Perl Wisdom