in reply to Why use <$fh> at all?
In reply to Tye's comments...
I appreciate the in-depth commentary. This is where I'm
going to learn some real perl.
To give some background that might clear up why I wrote
the code the way I did: I'm running perl on a Win98 system.
I'm writing a piece of code that accepts a collection of
text files from a legacy database along with a configuration
file and generates a series of reports in MS Excel.
The text files are comma-delimited and in a specific, reliable
format (for example, no '\n' at the end of file), allowing for
some of the assumptions I made. I tested the code above
on several of these files and achieved the correct results each
time.
The reports are extremely time-sensitive, and routinely sum to
over 40MB of data. I need the routines to be fast and quick -
although I take to heart Tye's comment that fast code is less
important than correct code.
One question I still have: what effect does binmode have
on the data? In WinBlows it looks as though I still end up
with text in my final array, regardless of whether I use binmode.
In that case, switching to binmode and gaining the speed
increase seems reasonable.
Thanks.
(tye)Re2: Why use <$fh> at all?
by tye (Sage) on Oct 05, 2002 at 15:23 UTC
|
Did you read binmode? (Granted, it is rather inaccurate and shows that the author doesn't understand the point -- click on one of the links to more modern versions of the document for much better text.) It says "Files that are not in binary mode have CR LF sequences translated to LF on input", which is accurate for Win32 systems. Checking for such takes some time. Actually having to fix that requires that all of the text in the buffer after any CRs needs to be "moved up", which takes even more time. Checking the source for the standard Microsoft C run-time library, I see that the standard:
char *in, *out;
in= out= buffer;
...
*in++ = *out++
method is used to avoid multiple "move" operations, which means that the cost of moving is incurred even if no CRs are found [ and also makes for much simpler code that is easier to "get right" than if we tried to switch to strchr() and memmove() to allow assembly-language constructs to search the string and to move the bytes (: ].
- tye (or ldad $54796500 if you're in a hurry) | [reply] [d/l] |
Re: Re: Why use <$fh> at all?
by hsmyers (Canon) on Oct 05, 2002 at 15:15 UTC
|
The difference binmode makes in DOS and Windows is crucial! Without binmode, all routines have to do (roughly speaking) two things; 1. did we just read a OD OA pair? (if yes, convert to '\n', 2. did we just read end of file? (if yes, stop. In binmode, all we care about is end-of-file. Even end-of-file can cause a problem if there is an embedded ^Z in the file (original DOS end-of-file mark--ignored in binmode.) And since these are implemented in the OS at root (thin wrapper in 'C' library) the distinction is important...
--hsm
"Never try to teach a pig to sing...it wastes your time and it annoys the pig."
| [reply] |
|