Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
The maximum number of files on a filesystem is limited by the number of inodes allocated when you create it (see 'mke2fs' and the output of 'df -i'). You can also tweak various settings on ext2/ext3 with tune2fs.

As you probably already know, written data is buffered in many places between your application and the disk. Firstly, the perlio layer (and/or stdio in the C library) may buffer data - this is controlled by $| or the equivalent methods in the more modern I/O packages.

Flushing will ensure the data is written to the kernel, but it won't ensure the kernel writes it to disk. You need the 'fsync' system call for this (and/or the 'sync' system call). You can get access to these via the File::Sync module.

Note that closing a filehandle only *flushes* it (write userland buffers), it does not *sync* it (write kernel buffers).

(If you're paranoid and/or writing email software, you may also want to note that syncing only guarantees that the kernel has successfully written the data to the disk. Most/all disks these days have a write buffer - there isn't a guarantee that data in that write buffer makes it onto persistent storage in the event of a power failure. You can get around this in various ways, but I'm drifting just a bit out of scope here...)

The above is to suggest an explanation for 'untie' taking a long time (flushing lots of buffered data on close), and it's also something anyone doing performance-related work on disk systems should know about. In particular, it may suggest why sqlite seemed slow on your workload. For robustness, sqlite calls 'fsync' (resulting in real disk I/O) at appropriate times (i.e. when it tells you that an insert has completed).

(Looking at one of your other replies...) If you are writing a lot of data to sqlite, you'll probably want to investigate the use of transactions and/or the 'async' mode. By default, sqlite is slow-but-safe. By default, writing data to a bunch of files is quick-but-unsafe. (But both systems can operate in both modes, you just need to make the right system calls or config options).

If you're going to be doing speed comparisons between storage approaches, you need to be sure of your needs for data integrity and then put each storage system into the mode that suits your needs before comparing. (You may well be doing all this already - apologies for the length response if so).


In reply to Re^3: Perl solution for storage of large number of small files by jbert
in thread Perl solution for storage of large number of small files by isync

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-03-29 13:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found