|Keep It Simple, Stupid|
Re: Re: Re: Re: Re: Optimising processing for large data files.by tilly (Archbishop)
|on Apr 11, 2004 at 07:26 UTC||Need Help??|
This thread is going nowhere, fast. I'll respond to this and let you enjoy the privilege of the last response after that.
Example 1.True. But you did say, The process consumed less than 2MB of memory total. There was no memory growth and the GC never had to run. In a subthread you gave an example whose behaviour suggested to you that Perl has a garbage collector that can stop the system and run GC. That implies strongly that you thought that Perl had a GC system similar to, say, Java....nothing to do with whether you are using true garbage collection.I never used the phrase "true garbage collection".
Example 2.The fact that you gave a correct statement does not change the fact that another statement was wrong. The wrong statement was, Databases are never quicker unless you can use some fairly simplistic criteria to make wholesale reductions in the volume of the data that you need to process within your application program. Furthermore you've defended this statement. Repeatedly.You also included a false assertion about when databases can give a performance improvement.Wrong. To quote you: "Sure, databases would not help with this problem."
Example 3.It has relevance to your statement, Databases are never quicker unless you can use some fairly simplistic criteria to make wholesale reductions in the volume of the data that you need to process within your application program. But since you refuse to consider the case, you won't see the relevance, and continuing to point it out has become a waste of energy.Consider the case where you have a very large table,...No, I will not consider that case. That case has no relevance to this discussion, nor to any assertions I made.
My assertion, in the context of the post (re-read the title!) was:And that assertion is wrong. If the nature of the processing is that you need to correlate the data with existing datasets in a manner that can conveniently be done with a join (the existing dataset can even be included at the beginning of the flatfile as a set of different blocks), then moving the join to a database can indeed improve speed. This goes double if the amount of data to be juggled is large enough that you get into memory management issues with Perl.
Another example which comes to mind is having to sort a very large dataset. (As in several GB of data.) A lot of research has gone into efficient sorting algorithms, and a lot of that research has gone into database design. Again, moving data into the database can win.
That is the assertion I made. That is the only assertion I made with regard to databases.If the nature of the processing that you need to do closely matches how a database is designed to work, then you can save. Exactly because the database has been built and tuned to perform the operation that you need.
I've given an example where it happens, and I've pointed you at an area of work where people customarily run into this issue.
No matter how you cut it, switch it around and mix it up. For any given volume of data that an application needs to process, reading that volume of data from a flat file will always be quicker than retrieving it from a DB. Full stop.This is obviously true, but does not logically imply your assertion. My assertion here is that there are certain kinds of operations that databases have been designed to do well (in addition to trying to fetch data), and you are not going to be able to code those operations in Perl to run more efficiently than they already do in a database.
Obviously unless the database is a good match for what you are going to do, and Perl is not, you would be insane to add that overhead to your process.
But if it is a match, then the database can win. Sometimes by a lot. Despite its obvious overhead.
No amount of what-if scenarios will change that nor correct any misassertion I didn't make.I'm see that you're confident in your view of reality. I won't try to convince you any further at this point.
Now you can end the thread however you wish to.