Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Multithreading a large file split to multiple files

by sundialsvc4 (Abbot)
on May 15, 2018 at 15:22 UTC ( #1214563=note: print w/replies, xml ) Need Help??


in reply to Multithreading a large file split to multiple files

Clearly, multithreading is not an option in this case.   (Nine women can’t make a baby in one month, etc.)

Since your logic appears to consist of reading one enormous file and routing its lines to a few others, with no real processing in-between, the only ruling constraint here appears to be file-buffering behavior.   You need to cue the operating system to read and write these files in very-large gulps, to relieve the strain on the disk drive’s hardware mechanisms.

The following post on StackOverflow.com appears to discuss this general issue directly.

From the first response (in 2009):

You can affect the buffering, assuming that you're running on an O/S that supports setvbuf. See the documentation for IO::Handle. You don't have to explicitly create an IO::Handle object as in the documentation if you're using perl 5.10; all handles are implicitly IO::Handles since that release.
Their specific recommendation was:  
use IO::Handle '_IOLBF'; open my $handle, '<:utf8', 'foo'; my $buffer; $handle->setvbuf($buffer, _IOLBF, 0x10000); while ( my $line = <$handle> ) { ...

Having found this, I now turn the question over to other Monks is this still relevant?

Replies are listed 'Best First'.
Re^2: Multithreading a large file split to multiple files
by jeffenstein (Pilgrim) on May 16, 2018 at 08:35 UTC

    From the IO::Handle documentation:

    # setvbuf is not available by default on Perls 5.8.0 and later. use IO::Handle '_IOLBF'; $io->setvbuf($buffer_var, _IOLBF, 1024);

    However, the PerlIO::buffersize module adds a PerlIO layer that can set the buffersize when opening a file:

    open my $fh, '<:buffersize(65536)', $filename;

Re^2: Multithreading a large file split to multiple files
by Anonymous Monk on May 15, 2018 at 15:35 UTC
    Having found this, I now turn the question over to other Monks is this still relevant?
    Handing it back to you. You don't know what you talk about, google an answer then ask if the answer you found is still current? If you don't know, stop posting. You help nobody

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1214563]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2018-05-25 02:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?