comment on

I first thought about this idea when I was thinking about John_M._Dlugosz's related question a couple of days ago. I made some attempts to verify the benefits of it, but ran into another of the 'features' of the OS on my box. This time, that copy file/b con refuses to do anything if the file is over 32k! Not in and of itself a bad idea if you've ever subjected your office colleges to half an hour of random morse code by typing copy *.exe con by mistake on a machine that doesn't have a volume control, but providing no way to override this is unforgivable.

Anyway, the idea. On most OS's, the system utilities should be pretty well tuned for handling fileio buffering - choosing read sizes etc. So rather than complicating your Perl scripts but re-inventing the wheel on buffering over and over, why not let the OS utilities take care of it for you? Something like:

...
local $/=\nn; # big chunks, small chunks whatever
open FH, '< copy bigbinary con |' or die $!;
while(<FH>) {
    binmode(FH);
    # do whatever.
}
[download]

In this way, the system utilities knowledge of appropriate buffer sizes etc. to handle the io efficiently. Additionally, if the action inherently requires large amounts of memory, then it is return to the os when the child process terminates.

There may also be some performance benefits from having the pipe further buffer the data, especially if the Perl program needs to process the input in small chunks.

In John_M._Dlugosz's case, he could use (forgive my not knowing the correct syntax) something like open FH, "grep -bcs -f '$delimiter' <bigbinary |" or die $!; to find the offsets of his records and then use seek on the file to go get his data?

This would be especially useful if the OS in question does something sensible with filesharing for processes requesting read-only access. He could hold his big file open readonly, whilst spawning seperate processes to do the searching.

What's this about a "crooked mitre"? I'm good at woodwork!

In reply to Re: Fastest I/O possible? by BrowserUk
in thread Fastest I/O possible? by Anonymous Monk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl-Sensitive Sunglasses
	PerlMonks