That was my thought as well. File systems aren't optimized to handle 1e6+ files in constant use, and that's exactly what databases ARE optimized for. Why does the problem definition require files, or is that negotiable?
Re^2: Perl solution for storage of large number of small files
Replies are listed 'Best First'.
I had a stint on this project with sqlite, and although I like databases, I was (so far) under the impression storing binary data into a database is not the best way to go. For me the blob columns always felt like an add-on...
Second, the performance with sqlite (async off) was poor, and it was only munging light text data. This is why I never thought of going completely db with this. And I didn't want to install a full mysql server on the machine, just to find out that it would have the same problems.
I think using the fs is ok for ~20-30 mio. files. So, do you opt for using a db when this number grows?
Can you explain more about what's contained in these files? How regular is their size and structure, both internal and external? All of these are factors that influence your decision of file system versus database.
I'm just saying that "Millions of small fragments of data" is a vote for database over file system. Answers to the other three questions (regularity of size, internal structure, and external structure) might vote in favor of the file system.
SQLite is a "database of last resort". It's good for prototyping, and for bundling with applications cos it requires no setup, but most of the time I'd prefer to use PostgreSQL, MySQL or a "grown-up" database like Oracle.
Implementing your own primitive bucketing algorithm by using several DBM files sounds ... perverse. If going down the DBM route, I'd use just one file and let perl handle that. I'd also use GDBM_File in preference to DB_File as it has fewer restrictions on your data. You might want to consider DBM::Deep if you want to fake up a filesystem in a file, or go crazy and create a *real* filesystem in a file, optimised for your circumstances, and mount that.
I disagree. SQLite is a 'database of first resort'. It doesn't scale to the same heights as server-based approaches, but those heights are quite a way off for the majority of systems and it's performance is good when you stay within it's scaling limits. The docs are also up-front about when you *don't* want to use it.
Using sqlite in a solution simplifies the solution compared to server-based RDBMSs (fewer moving parts and fewer deployment/management issues) and so reduces the costs and time spent dealing with it.
So unless you:
already have existing server-based database infrastructure
know in advance your expected load is approaching the limits of sqlite
or particularly need a feature one of the server-based RDBMSs provides
I'd say that choosing MySQL or postgres over sqlite is premature optimisation.