Cheers for an interesting post. I agree it stinks of personalities but anyway.
in reply to Performance Trap - Opening/Closing Files Inside a Loop
I was agreeing with the idea of doing 10MB segments with an in-memory hash.
But what I didn't quite understand is why they are using the filesystem as the database, sure it is possible but hardly seems useful. Add to that the restriction on filenames and even potential security problems if something gets injected there..
I was thinking one of the tons of db-like options available might be useful, either as the end product or as the intermediate stage. If the java guys can't read a tied hash, mldbm or whatever you could use an sql db, anyway these things ought to be good at dealing with memory and disk write optimization. You can always dump the db to separate files if that's what you want.
Anyway, the point is that you are intentionally not being told what the project is supposed to do, so watch your back! I would personally ask why on earth they are writing thousands of files to the disk, that is so 70s. Don't the java guys know how to use Oracle or whatever they have in the same room? :) And they waste your time too, talk about inefficient use of resources!
Anyway it would be really funny if the answer is just to use the sql LOAD DATA INFILE command on a database you already have to solve the problem. You may be interested in the mysqlimport utility which is an interface to that command.