http://www.perlmonks.org?node_id=377123


in reply to Combining Ultra-Dynamic Files to Avoid Clustering (Ideas?)

From what I remember about the "bad old days" when disks were smaller, the block size isn't a static 4 Kb. It depends on the size of the disk that you're using. You could try partitioning the disk that you have into 2 (or more) smaller partitions. This could reduce the block size on each individual partition, thus reducing waste overall.

thor

  • Comment on Re: Combining Ultra-Dynamic Files to Avoid Clustering (Ideas?)

Replies are listed 'Best First'.
Re^2: Combining Ultra-Dynamic Files to Avoid Clustering (Ideas?)
by superfrink (Curate) on Jul 24, 2004 at 17:58 UTC
    I believe when you create an ext2 filesystem you can select the block size to be 1k, 2k or, 4k. Also maybe look into ReiserFS which I seem to recall will pack multiple small files into the same block to save space.

    Also of note is if you are using ext2 (common in Linux) or FFS (OpenBSD) and probably others you have to be concerned with the inode count ("Index NODE"). Every file on those systems needs an index node. If you have too many files you won't be able to create any more even though you have free disk space. Use the command "df -i" to see how many inode your partitions have free. I've seen it happen more than once and when you run out of inodes and try to create a file you will get a "No space left on device" error message.

    Also note that ReiserFS does not use inodes. Instead it uses a balanced tree. Balanced trees are faster to search when you a large number of entries than a linear list (which I believe ext2 uses for filenames in a directory).

    All that said I don't really understand how your data will be generated and accessed so I'd also probably default to suggesting a database like others above have said. That's just because it's eaiser to let some already developed and tested code do the work for me. I guess what I mean sure I can write linked lists in C with pointers but Perl can cut down my development time because lists are a given part of the language.

    PS: I don't want to start a language war. I don't want to talk about STL. I was just mentioning that using existing tools can help me finish my work faster.