http://www.perlmonks.org?node_id=484813

nan has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys.

Hope you are well, I'm currently encountered a problem about SQL searching efficiency as I need to read queries one by one from a file (assuming that each line represents a query) and then display total results together. My original codes work well if there is only one query available, but now the speed is very slow as it needs to look through the database (about 500MB) everytime it has found a new query.

So guys, do you have some idea on how to improve the efficiency? Many many thanks!

Nan

  • Comment on How to improve MYSQL search performance of perl?

Replies are listed 'Best First'.
Re: How to improve MYSQL search performance of perl?
by radiantmatrix (Parson) on Aug 18, 2005 at 15:24 UTC

    This is not a Perl question, but a DB-optimization question (unless you want to write Perl code to intelligently combine queries and extract the data back out -- but that would probably *still* be slow).

    Make sure you have normalized your DB for the type of queries you wish to perform, and index the columns you are searching through. For example, if you have a table with columns:

    ID
    Name
    Age
    Details
    SSN
    Record_file
    
    and you commonly search for records by Name and SSN, create indexes on those columns. Intelligent use of indexes, along with proper normalization, nets in huge speed gains in many circumstances.

    <-radiant.matrix->
    Larry Wall is Yoda: there is no try{} (ok, except in Perl6; way to ruin a joke, Larry! ;P)
    The Code that can be seen is not the true Code
    "In any sufficiently large group of people, most are idiots" - Kaa's Law
Re: How to improve MYSQL search performance of perl?
by davidrw (Prior) on Aug 18, 2005 at 15:27 UTC
    Initial thought is that you need to examine your queries and your table structure and indicies. 500MB isn't all that much if the data is index'd properly. Can you provide some sample queries and schema?

      Hi,

      As I need to read line by line and search them in the database, I used a subroutine to handle all database work. Below is my code.

      thanks again,

      Nan

        I would take trammell's suggestion a step further and also not recreate the statement handle every time and actually take advantage of the statement handle (and placeholders)-- i think this will have a decent improvement in the performance (amount of gain is probably db-dependent):
        my $dbh = DBI->connect('DBI:mysql:diet', {RaiseError => 1, AutoCommit +=> 0} ) || die "Failed to connect: $DBI::errstr"; my $sth = $dbh->prepare( qq{select topic FROM table1 WHERE uri LIKE ?} + ); search($sth, 'foo'); search($sth, 'bar'); $sth->finish(); $dbh->disconnect(); #disconnect from database; sub search{ my $sth = shift; # require statement handle (this could probably be + a global var instead if desired) my $q = shift; # take search parameter from html <form/> my $found = 0; #initialize category found count; $sth->execute($q); my $rows = $sth->fetchall_arrayref( {} ); printf "%d rows found for '%s'.\n", scalar(@$rows), $q; foreach my $row (@$rows){ printf " Topic: %s\n", &topic($row->{topic}); } }
        One improvement you can make is to only open your database handle once at the beginning of the script, and reuse that handle instead of recreating it for each query.
Re: How to improve MYSQL search performance of perl?
by Anonymous Monk on Aug 18, 2005 at 15:57 UTC
    500Mb is a meaningless measurement if you want to get a feel whether your database has many records to search through. It's meaningful when determining disk space need, or when doing backups, but not when you want to indicate you have a lot to search through. A 500 Mb database in a radiology lab probably means it only has one row - with a small image stored in it.

    The number of rows a query might consider, now, that's an important measurement. The size of a row, both in columns (used in the query) and the total size in bytes are important as well, but much less so.

    Having said that, 500 Mb is tiny by modern standards. Most desktops, and even many laptops will be able to keep almost the entire database in core memory - if you have a dedicated machine for your database (and you should), put in 1 Gb of RAM, and you'll be sure you have the entire database in memory.

    But even if you have that, your approach can still be "slow". Whether or not that is significantly improveable depends almost entirely on your database structure (tables, indices), and the queries performed. If the queries could be almost anything, there will be many queries that will not be able to make use of the given indices, resulting to table scans. And even with the entire database in core memory, having to do many table scans will slow down things.

    But as others said, this is mostly a database question. Consult your local database administrator/guru.

Re: How to improve MYSQL search performance of perl?
by trammell (Priest) on Aug 18, 2005 at 16:52 UTC

      Hi,

      Many many thanks for that great article but a question just poped up after reading. It says that mySQL will build indexes for the whole table when calling "CREAT TABLE ....." I suppose it means that I don't need to rebuild an index (my table only has two columns). Ok, even I rebuild an index by myself, how it can be used?

      Thanks again,

      Nan

        MySQL will choose the appropriate index for the tables involved in your query; in my experience it chooses correctly most of the time. I see from another post in this thread that your query is:
        select topic FROM table1 WHERE uri LIKE '$q'
        You can find out what indexes are used by MySQL in this query by running the command
        EXPLAIN SELECT topic FROM table1 WHERE uri LIKE 'something'
        where "something" is one of your parameters. You can see what indexes are defined on your table by running the command
        SHOW CREATE TABLE table1;
Re: How to improve MYSQL search performance of perl?
by CountZero (Bishop) on Aug 18, 2005 at 19:51 UTC
    Are all these queries similar to each other? I mean is it like:
    SELECT * FROM table WHERE field = 1 SELECT * FROM table WHERE field = 10 SELECT * FROM table WHERE field = 75 SELECT * FROM table WHERE field = 3 SELECT * FROM table WHERE field = 8 ...

    If that is the case you could probably benefit from using placeholders and using $sth = $dbh->prepare($statement) or $sth = $dbh->prepare_cached($statement).

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      Hi CountZero,

      They are all URLs, for example: http://www.permonks.org/. What I did before is to make the database stuff as a sub routine and call it every time a new line is read as I don't know how to optimize the codes:

      Thanks again,

      Nan

        I see why it is so slow: you are effectively for every search opening a connection, doing the search for 1 item and then destroying the connection. All this connecting and disconnecting is very time-consuming.

        You should put your connection stuff in an initialization subroutine, then prepare your SQL-statement once, using place-holders as follows: " my $sth = $dbh->prepare('select topic FROM table1 WHERE uri LIKE ?');" (added benefit: you don't have to worry about quoting!) and then hand off the $sth-variable and the search-argument to your search-subroutine which calls the execute-method with the search string as its parameter:

        my ($statement_handle, $search_argument)=@_; $statement_handle->execute($search_argument); ...

        Do you get the idea!

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law