I'm assuming the change only let the process work through the first 65KB of the file?
No. It did process the whole file, but in 64k chunks.
The reason it ran more quickly is because if the file doesn't contain newlines, -p will load the file as one huge single line.
As pointed out above, the problem with the processing files in chunks, is that if the search term straddles a 64k chunk--say 2 bytes at the end of one chunk, and two bytes at the beginning of the next, then the search term won't match and the substitution won't be made.
The really simple solution to that, it to process the file twice, with different buffer sizes chosen to be relatively prime. You might use 1MB for the first pass and 1MB -3 for the second. This will ensure than any overlaps missed by the first pass will not fall on a boundary on the second pass. Up to 1024GB anyway.
So,
perl -e"BEGIN{$/=\(1024**2) }"
-pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/sg" infile >outfile1
perl -e"BEGIN{$/=\(1024**2-3)}"
-pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/sg" outfile1 >outfile2
Two passes is obviously slower than one, but much faster than loading the whole damn file into ram on a constrained machine.
This last point is what I assume to be the cause of the performance differential between your Linux and Windows set-ups. If the former has sufficient free ram to allow the whole file to be loaded in one pass, and the latter does not and moves into swapping, the difference is explained.
Another alternative would be to use a sliding buffer, but that too complicated for a one-liner, and often doesn't yield sufficient performance to beat the two-pass approach.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|