I need to store a huge amount of data having a fixed structure:
Each item has a unique (alphanummeric, 7-bit-ASCII) id
A fixed number of "meta" information fields contain numbers or text data up to 100 bytes (worst case, usually <30 bytes)
meta information won't change once the item has been created
Each item has two text parts usually 2-16k in size, somethimes some MB, but up to 2 GB have to be supported
The text parts are delivered in blocks up to a predefined size limit (currently about 16 MB, but may be changed to anything from ~1k if storage requires a change), currently typically 1900 bytes
The final text part size is unknown, same for the number of blocks
The blocks may not arrive in sequential order, but they contain a sequence number starting from zero for each item, every sequence number is used
Up to 10 mio. items should be stored at the same time, maybe more in the future
About 90% of the items may be deleted some weeks after they were created
Some of the remaining are deleted later, few are kept forever
Each item must be accessible quickly by unique item id
Deletion of items may be really slow
I considered using MongoDB, but it's becoming slow for 15+ mio. items and has a 16 MB limit per item. mySQL can't handle this amount, too. I'd like to store the stuff in files, but avoid one file per item as these many files are hard to handle for filesystems.
I considered tie and GDBM_File which is rock solid on reading, I could store many items in one file, delete them and append/insert text blocks as they are arriving, but GDBM is critical when more than one process is writing the same file and I'm not sure that no two process will ever write the same file as new text blocks are arriving for different messages.