Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Perl solution for storage of large number of small files

by mattr (Curate)
on Apr 30, 2007 at 13:06 UTC ( #612765=note: print w/ replies, xml ) Need Help??


in reply to Perl solution for storage of large number of small files

I'm also curious about why you don't want to use a database. Is it because you want to be absolutely sure of when data is safely stored on the disk, or because it is faster this way? IIRC Oracle was designed at least originally to take advantage of physical layout on the disk, though perhaps not so important these days. Did you try dumping this into Mysql or PostgreSQL and dislike the solution for some reason? Have you tried sleepycat's BDB?

Incidentally InnoDB performance tuning tips notes:

Wrap several modifications into one transaction. InnoDB must flush the log to disk at each transaction commit if that transaction made modifications to the database. The rotation speed of a disk is typically at most 167 revolutions/second, which constrains the number of commits to the same 167th of a second if the disk does not fool the operating system.

So one disk rotation is 6msec minimum right there. Are you spreading your tied files across several disks? Do you require every write to be saved to disk physically instantaneously, or can you wait a second or so?

Oh, the other thing is if you have disk to burn you could increase your inode size, on XFS, or mirror your disks for speed. But regardless, it seems that moving to a database implementation now rather than waiting for things to explode might be a good idea. I don't suppose your system could do locking to handle multiple writers, could it? Perhaps more info about what you are actually trying to do would be useful.

Also, I was thinking about a presentation at YAPC::Asia I think it was, about how a large service was built on Perl. Livedoor or Mixy. Anyway they split their indices and tables across different servers (using the first characters of user names IIRC). They built a system capable of easily repartitioning this layout as users increase.


Comment on Re: Perl solution for storage of large number of small files
Re^2: Perl solution for storage of large number of small files
by diotalevi (Canon) on May 01, 2007 at 00:20 UTC

    DB_File is the old API for BerkeleyDB, the Sleepycat database. It's the same thing.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re^2: Perl solution for storage of large number of small files
by andye (Curate) on May 01, 2007 at 10:29 UTC
    InnoDB must flush the log to disk at each transaction commit if that transaction made modifications to the database. ... constrains the number of commits to the same 167th of a second

    unless you set innodb_flush_log_at_trx_commit to 0, which switches it to flush once a second.
    http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html

    HTH, andye

      Thanks, that is the page I was looking at, why I mentioned "1 second or so". It seemed he was unwilling to wait that long but buffering it would it seems increase efficiency.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://612765]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (9)
As of 2014-07-31 23:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (255 votes), past polls