Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

perl mysql full table scan

by kosie99 (Acolyte)
on Jul 10, 2013 at 07:46 UTC ( #1043419=perlquestion: print w/ replies, xml ) Need Help??
kosie99 has asked for the wisdom of the Perl Monks concerning the following question:

I am wondering what would be the most efficient way to use perl to query a mysql table that will result in a full table scan.

Let's say I have 1 million rows and I need to search on a non-indexed column for a substring.

Which would be more efficient:
1. limiting the query result with mysql "where <column> like '%substring%'"
2. get full table in query result and then loop through result, matching substring using a perlre?


Both will result in a full table scan, right? So, I guess the question really is which of mysql or perl is the most efficient in matching substrings?

Comment on perl mysql full table scan
Download Code
Re: perl mysql full table scan
by moritz (Cardinal) on Jul 10, 2013 at 08:04 UTC

    Doing it in mysql will be more efficient, because if you do it in perl, all the data has to be serialized in mysql, transferred to perl, then deserialized, and finally the matching happens.

    In the other scenario, mysql only needs to transfer the matched rows to perl, so all in all there is less to do.

    Finally you should consider that maybe in future an index will be added in mysql, and then it will be used automatically. If you do the scan in perl, you won't profit form the index.

    The rule of thumb is to let the database do what it's good at, and only do the rest in perl.

Re: perl mysql full table scan
by hdb (Prior) on Jul 10, 2013 at 08:08 UTC

    Assuming you are able to access your database using Perl, this is a simple test, right? The scripts to do 1. or 2. are not much different really.



Re: perl mysql full table scan
by kcott (Abbot) on Jul 10, 2013 at 08:10 UTC

    G'day kosie99,

    The short answer is to write it both ways and then Benchmark.

    I have, in the past, found leaving the matching to Perl to be faster; however, I have not had to do this with MySQL so I can't really provide specific advice. Furthermore, that was quite a long time in the past - things may have changed.

    For matching simple substrings, without any fancy pattern matching requirements, consider index instead of using a regular expression.

    -- Ken

Re: perl mysql full table scan
by Neighbour (Friar) on Jul 10, 2013 at 10:07 UTC
    It depends on what your limiting factor is (any combination):
    • Time
    • Memory
    • Network bandwidth
    • Processing power
    If time is not a limiting factor, then it doesn't matter which method you pick :)
    If Memory is not a limiting factor, move your database to a ramdisk.
    If Network bandwidth is not a limiting factor (but at least one of the others is), you'd have to benchmark both options.
    If Processing power is not a limiting factor, you could just let mysql do it's thing (least effort).

    Also something that might matter...how often do you have to perform a query like this? If the answer is more than 'occasionally', consider letting mysql index the column.
Re: perl mysql full table scan
by sundialsvc4 (Abbot) on Jul 10, 2013 at 12:09 UTC

    You might have to do both.   Be sure, however, that MySQL is doing all the filtering that it can, because you don’t want to send those millions-of-rows across a network-wire between two servers, just to throw away nearly-all of them.   (Whereas doing a local-disk-drive scan of millions of rows is actually not so bad.)

    Also, seriously consider any features that MySQL may offer, including its word-based indexing capabilities for large-text fields.   If you do this stuff “frequently,” it becomes the classic tradeoff of Speed vs. Space, in several ways.   Benchmark your ideas carefully.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1043419]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (18)
As of 2014-10-22 14:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (118 votes), past polls