http://www.perlmonks.org?node_id=927836


in reply to Re: Storing/Retrieving images as blobs
in thread Storing/Retrieving images as blobs

Why a clever tree structure when you could have a table that contains a file name? I'm not expert enough to know whether a binary store in the table or file in file system is the winner though. Having a file would mean the file is available should the database be down and if you use some logical scheme should be locatable and accessible by real people.
  • Comment on Re^2: Storing/Retrieving images as blobs

Replies are listed 'Best First'.
Re^3: Storing/Retrieving images as blobs
by MidLifeXis (Monsignor) on Sep 26, 2011 at 13:13 UTC

    The OS killer issue alluded to above is when you have a large number of entries in a single directory. In order for the OS to resolve the file name it needs to look up each portion of the file name in the directory. If there are a significant number of entries in a directory, multiple reads on disk may be necessary to find the particular entry.

    As an example: assume that you can store 20 directory entries in a block that can be read in a single shot. If you have 100 files, then on average, you will need three disk reads to find the location of a particular file (N/2 comparisons gets you to the third block of five necessary to store 100 entries). This increases to 25 reads for 1000 entries (1000 / 2 => 500, 500 / 20 => 25).

    If you can break the directory structure into N parts so that no directory contains more than 20 entries, you can find the file location, on average, in two reads (100 / 20 => 5 entries in each subdirectory). With 2 levels of no more than 20 entries each, you can store and retrieve 400 files with a guaranteed maximum number of reads of 2. 8000 in three, 160K in 4, 3.2M in 5, and so on.

    The value of 20 depends on the file system and how directory lookups are done in the OS or the application. A full readdir scan at the application level is less able to be optimized by the OS than having the OS open the file for you. Basically, you need to have a depth of X and number of entries of Y where YX is the number of files you can store (N) before you are willing to reorganize your files.

    If the directory entries are actually stored in a database (I think that BeOS had that feature), then you may not need to worry about the read sizes and how many directory entries can fit into a single read.

    As an interesting aside, the classical Unix File System used this technique to minimize the number of lookups it had to do to find the data blocks for given files. The inode pointed to by the directory entry had pointers to data blocks for small files, then some pointers to blocks of pointers to data blocks for larger files, and then some pointers to blocks of pointers to blocks of pointers for even larger files. The larger the number of files data blocks, the deeper the "directory of pointers" became.

    --MidLifeXis

Re^3: Storing/Retrieving images as blobs
by dHarry (Abbot) on Sep 26, 2011 at 11:49 UTC

    There are several pros/cons to both alternatives and a detailed tradeoff analysis is needed to draw any conclusions. In my experience the database sooner or later grinds to a halt when the tables with BLOBs become too big. After that there is no choice really. The 3GB of the OP doesn't seem too big but I'm thinking Oracle here. I have no idea when mySQL gives up, I suppose it depends on type/version of mySQL database, configuration and hardware etc.

    Assuming you reach the point where you have to move the images to the file system, it pays of the design a directory structure to prevent the OS from choking and/or improve access times. You typically don't want millions of small files in one directory.

    As I mentioned an alternative is a dedicated database product to store the images. These databases are optimized for that purpose.