Re^2: Perl solution for storage of large number of small files

Replies are listed 'Best First'.
Re^3: Perl solution for storage of large number of small files by isync (Hermit) on Apr 30, 2007 at 14:11 UTC
I had a stint on this project with sqlite, and although I like databases, I was (so far) under the impression storing binary data into a database is not the best way to go. For me the blob columns always felt like an add-on... Second, the performance with sqlite (async off) was poor, and it was only munging light text data. This is why I never thought of going completely db with this. And I didn't want to install a full mysql server on the machine, just to find out that it would have the same problems. I think using the fs is ok for ~20-30 mio. files. So, do you opt for using a db when this number grows?	[reply]
Re^4: Perl solution for storage of large number of small files by tedv (Pilgrim) on Apr 30, 2007 at 19:01 UTC
Can you explain more about what's contained in these files? How regular is their size and structure, both internal and external? All of these are factors that influence your decision of file system versus database. I'm just saying that "Millions of small fragments of data" is a vote for database over file system. Answers to the other three questions (regularity of size, internal structure, and external structure) might vote in favor of the file system. -Ted	[reply]
Re^4: SQLite performance by DrHyde (Prior) on May 02, 2007 at 10:04 UTC
SQLite is a "database of last resort". It's good for prototyping, and for bundling with applications cos it requires no setup, but most of the time I'd prefer to use PostgreSQL, MySQL or a "grown-up" database like Oracle. Implementing your own primitive bucketing algorithm by using several DBM files sounds ... perverse. If going down the DBM route, I'd use just one file and let perl handle that. I'd also use GDBM_File in preference to DB_File as it has fewer restrictions on your data. You might want to consider DBM::Deep if you want to fake up a filesystem in a file, or go crazy and create a real filesystem in a file, optimised for your circumstances, and mount that.	[reply]
Re^5: SQLite performance by jbert (Priest) on May 02, 2007 at 10:52 UTC
I disagree. SQLite is a 'database of first resort'. It doesn't scale to the same heights as server-based approaches, but those heights are quite a way off for the majority of systems and it's performance is good when you stay within it's scaling limits. The docs are also up-front about when you don't want to use it. Using sqlite in a solution simplifies the solution compared to server-based RDBMSs (fewer moving parts and fewer deployment/management issues) and so reduces the costs and time spent dealing with it. So unless you: already have existing server-based database infrastructure know in advance your expected load is approaching the limits of sqlite or particularly need a feature one of the server-based RDBMSs provides I'd say that choosing MySQL or postgres over sqlite is premature optimisation.	[reply]


Don't ask to ask, just ask
	PerlMonks