http://www.perlmonks.org?node_id=11132679


in reply to (WIN) Autoflush, Perl, Sleep and Powershell

Post withdrawn for the moment because I found a bug after prematurely posting this. Sorry.

Ok, back again. I see what happened. I put an extra line in the disk example to make sure that the $| was actually causing a flush on the hard disk write. It wasn't. This caused me to be fooled. To get $| to work on the disk file, a select() is needed.

OK. I think this is ok now. Perl itself will flush on every print(). Now, I am not sure about the Power Shell connection.

removed tag <readmore>
OK. First up flush and the command console:

use strict; use warnings; $| = (@ARGV > 0); print "this is partial line...."; sleep (20); print "the rest of the story\n";
When run with no args, what you see on the console is that there is a 20 second delay before you see anything and then the entire line appears on the console at once.

When run with one or more args, you will see the first print, then a 20 second delay, then the second print. Perl does not have to wait to see the \n before flushing what it has to the console screen. Of course program end causes all data to be written to the disk.

NOW, second up, writing to a hard disk:

use strict; use warnings; open FILE, '>', 'test' or die; select FILE; #this is important for $| the hard disk file! $| = (@ARGV > 0); print STDERR "partial line going to disk"; print FILE "this is partial line...."; sleep (20); print FILE "the rest of the story\n"; print STDERR "line is complete\n"; print STDERR "starting another sleep\n"; sleep(20); print STDERR "program ended\n";
To test this, I just opened another command window and repeatedly ran "type test".
I found much to my surprise that each print goes to the disk.
There is no "waiting for the \n" when flush is properly enabled. Extra sleep is there to make sure that the second print worked before program end.

DISCUSSION:

When you are writing a text file to disk, at the driver level, the OS is working with a typically 4Kbyte buffer. Nowadays this almost always takes 8 512 byte sectors on the hard disk. Why the industry has settled on these values (4K and 512 is a long story that involves some history). Other choices are certainly plausible and are sometimes used for special applications.

Without any flushing going on, this 4K buffer is written to the disk when it is full. Any overlap of lines (buffer typically will not end exactly on a new line boundary) spills over into the next "fresh, newly started" 4K buffer.

When auto flushing is going on, the entire 4K buffer will be re-written to the disk as each new print() is executed. There is no such thing as a "partial" 4K buffer write as far as the hard disk system is concerned. If the first line used say 37 bytes of the 4K, then the rest of the buffer is just garbage. 4K is the increment that the file system uses. When the next line comes along, it gets added to this 4K buffer. Note at the disk drive level, this 4K write could take awhile due to rotational delays and the possibility that that the drive will have to do a seek between one of the eight sectors. Writing 4K block to the disk on every flush is just fundamental to how this works.

So, watch out! if you enable flush to the hard disk. Every print() will flush. This means that a loop which uses separate prints for each value on a single line is going to cause the performance to "auger into the ground".

I had thought that there was an optimization for only flush on a \n when writing to the disk, but that is not true. This erroneous belief was caused because my code wasn't properly enabling flush on a hard file handle. If you want to do line by line flushing yourself, it is possible to put in explicit flush() commands. In general, don't flush to hard disk due to the performance penalty discussed above. removed tag </readmore>