Re^2: How do I measure my bottle ?

Thanks you all. Now I got a better direction, since I have no experience working in Unix and administrative work. Actually, I've tried Tie::File, database, Tie::Hash before, every one has its problem that is not feasible for me. Tie::File and Tie::Hash are very slow (20 hours reading those files). However, puting 1,000 GB (1 TB) in database (i've tried DB2, SQL server, mySQL) takes me quite a few space that is not an affordable solution for me.

thanks again, and happy weekend !

Comment on Re^2: How do I measure my bottle ?

Replies are listed 'Best First'.

Re^3: How do I measure my bottle ?
by Tanktalus (Canon) on Mar 25, 2005 at 15:28 UTC

I think you hit upon (and summarily eliminated) the best solution: a database. You have 1TB of data. It's stored in flat files? Properly, this data should have never been put into files, but directly into a database, and you would simply query for the data you want as you want it. Yes, it takes up a fair bit more space than flat files. But you're trading space for speed. Disk space is cheap, CPU speed not so cheap.

You say you're using an AMD64 machine. Are you running a 64-bit OS on it? If not, you may want to try that first - that may help your I/O speed somewhat, and probably will help your ability to use all your memory.

Once you're using a 64-bit OS, it's time to get a 64-bit perl. With a 32-bit perl, you'll run out of address space long before you can load your data in.

Finally, then you can get a 64-bit database. I know, I know, I'm harping on this. But, let's face it. You have 1TB of data you're trying to work with, but only 2GB of RAM. The other 998GB of data will simply get swapped out to disk while you're loading it from disk. This is going to be incredibly!!!! slow. Use a database - it has algorithms and code that are intended to deal with this type of problem, written in highly-optimised (if you believe the TPC ratings) C code. Load data as you need it, discard it when you're done with it. Put as much logic as you can into your SQL statements, let the database handle getting your data in the most efficient manner possible.

I really, honestly, think that if you cannot afford the database storage, you can't afford any solution. Storage is relatively cheap, and trying to load everything into memory is simply going to fail. The Tie::* modules are likely your next best bet as they probably also load/discard data as needed, allowing you to live within your 2GB of RAM.

[reply]


Think about Loose Coupling
	PerlMonks