Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Re: Re: Re: Optimising processing for large data files.

by BrowserUk (Pope)
on Apr 11, 2004 at 06:48 UTC ( #344219=note: print w/ replies, xml ) Need Help??


in reply to Re: Re: Re: Optimising processing for large data files.
in thread Optimising processing for large data files.

Example 1.

...nothing to do with whether you are using true garbage collection.

I never used the phrase "true garbage collection".

Example 2.

You also included a false assertion about when databases can give a performance improvement.

Wrong. To quote you: "Sure, databases would not help with this problem."

Example 3.

Consider the case where you have a very large table,...

No, I will not consider that case. That case has no relevance to this discussion, nor to any assertions I made.

My assertion, in the context of the post (re-read the title!) was:

If you have a large volume of data in a flat file, and you need to process that data in it's entirety, then moving that data into a database will never allow you to process it faster.

That is the assertion I made. That is the only assertion I made with regard to databases.

Unless you can use some (fairly simple, so that it can be encapsulated into an SQL query) criteria to reduce the volume of the data that the application needs to process, moving the data into a DB will not help.

No matter how you cut it, switch it around and mix it up. For any given volume of data that an application needs to process, reading that volume of data from a flat file will always be quicker than retrieving it from a DB. Full stop.

No amount of what-if scenarios will change that nor correct any misassertion I didn't make.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail


Comment on Re: Re: Re: Re: Optimising processing for large data files.
Re: Re: Re: Re: Re: Optimising processing for large data files.
by tilly (Archbishop) on Apr 11, 2004 at 07:26 UTC
    This thread is going nowhere, fast. I'll respond to this and let you enjoy the privilege of the last response after that.

    Example 1.
    ...nothing to do with whether you are using true garbage collection.
    I never used the phrase "true garbage collection".
    True. But you did say, The process consumed less than 2MB of memory total. There was no memory growth and the GC never had to run. In a subthread you gave an example whose behaviour suggested to you that Perl has a garbage collector that can stop the system and run GC. That implies strongly that you thought that Perl had a GC system similar to, say, Java.

    Example 2.
    You also included a false assertion about when databases can give a performance improvement.
    Wrong. To quote you: "Sure, databases would not help with this problem."
    The fact that you gave a correct statement does not change the fact that another statement was wrong. The wrong statement was, Databases are never quicker unless you can use some fairly simplistic criteria to make wholesale reductions in the volume of the data that you need to process within your application program. Furthermore you've defended this statement. Repeatedly.

    Example 3.
    Consider the case where you have a very large table,...
    No, I will not consider that case. That case has no relevance to this discussion, nor to any assertions I made.
    It has relevance to your statement, Databases are never quicker unless you can use some fairly simplistic criteria to make wholesale reductions in the volume of the data that you need to process within your application program. But since you refuse to consider the case, you won't see the relevance, and continuing to point it out has become a waste of energy.

    My assertion, in the context of the post (re-read the title!) was:

    If you have a large volume of data in a flat file, and you need to process that data in it's entirety, then moving that data into a database will never allow you to process it faster.

    And that assertion is wrong. If the nature of the processing is that you need to correlate the data with existing datasets in a manner that can conveniently be done with a join (the existing dataset can even be included at the beginning of the flatfile as a set of different blocks), then moving the join to a database can indeed improve speed. This goes double if the amount of data to be juggled is large enough that you get into memory management issues with Perl.

    Another example which comes to mind is having to sort a very large dataset. (As in several GB of data.) A lot of research has gone into efficient sorting algorithms, and a lot of that research has gone into database design. Again, moving data into the database can win.

    That is the assertion I made. That is the only assertion I made with regard to databases.

    Unless you can use some (fairly simple, so that it can be encapsulated into an SQL query) criteria to reduce the volume of the data that the application needs to process, moving the data into a DB will not help.

    If the nature of the processing that you need to do closely matches how a database is designed to work, then you can save. Exactly because the database has been built and tuned to perform the operation that you need.

    I've given an example where it happens, and I've pointed you at an area of work where people customarily run into this issue.

    No matter how you cut it, switch it around and mix it up. For any given volume of data that an application needs to process, reading that volume of data from a flat file will always be quicker than retrieving it from a DB. Full stop.
    This is obviously true, but does not logically imply your assertion. My assertion here is that there are certain kinds of operations that databases have been designed to do well (in addition to trying to fetch data), and you are not going to be able to code those operations in Perl to run more efficiently than they already do in a database.

    Obviously unless the database is a good match for what you are going to do, and Perl is not, you would be insane to add that overhead to your process.

    But if it is a match, then the database can win. Sometimes by a lot. Despite its obvious overhead.

    No amount of what-if scenarios will change that nor correct any misassertion I didn't make.
    I'm see that you're confident in your view of reality. I won't try to convince you any further at this point.

    Now you can end the thread however you wish to.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://344219]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (15)
As of 2014-07-31 13:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (248 votes), past polls