|Keep It Simple, Stupid|
Re: Re: Optimising processing for large data files.by BrowserUk (Pope)
|on Apr 10, 2004 at 17:11 UTC||Need Help??|
But what the most experienced people say is, "Don't prematurely optimize!".
I wasn't addressing those understand Tony Hoare's assertion, just those that don't.
Also improvements are often platform specific.
Of course. Should anyone with access to a.n.other OS and a couple of hours to spare care to run my tests on their OS, I'd be interested to see how much they differ.
For instance you got huge improvements by using sysread rather than read. However in past discussions...
Going by the date of the past discussions, I could well imagine that they were before the addition of the extra IO layers that caused the majority of the slowdown post 5.6.x
Someone who reads your post and begins a habit of always using sysread has taken away the wrong lesson.
Anyone who does that hasn't read my post--at least not properly. But I have to say that I have more faith in people than you seem to have. I'm just an average programmer and if I can work these things out out, most others can too.
In the worst case, that someone will use what I describe wrongly and their program will run more slowly. If this happens, they will:
Now some corrections.
You only make one correction, and it appears to be a matter of terminology rather than substance. Whether you class perl's reference counting/variable destruction as garbage collection or not. I have demonstrated in the past that, under Win32 at least, when large volumes of lexical variables are created and destroyed, some memory is frequently returned to the OS.
Also your slams against databases are unfair to databases.
I didn't make any "slams" against databases. Only against advice that they be used for the wrong purpose, and in the wrong ways.
But even on the performance front you're unfair. Sure, databases would not help with this problem.
I was only discussing this problem. By extension, I guess I was also discussing other problems of this nature, which by definition, is that class of large volume data processing problems that would not benefit from the use of a database.
But databases are often a performance win when they don't reduce how much data you need to fetch because they move processing into the databases query engine.
So what your saying is that instead of moving the data to the program, you move the program to the data. I whole heartedly agree that is the ideal way to utilise databases--for that class of programs that can be expressed in terms of SQL. But this is just an extreme form of subsetting the data. Potentially to just the results of the processing.
I see no conflict between that and what I said.
Examine what is said, not who speaks."Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail