Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Searching with MySQL (and REGEXPs)

by perrin (Chancellor)
on Apr 02, 2002 at 02:52 UTC ( [id://155909]=note: print w/replies, xml ) Need Help??


in reply to Searching with MySQL (and REGEXPs)

Is there any reason you didn't use MySQL's built-in full-text search for this? It seems like it would have saved a lot of time and would almost certainly be much faster. There are also nice CPAN modules that do very efficient text searching in a MySQL database, like Search::InvertedIndex or DBIx::FullTextSearch.

Replies are listed 'Best First'.
Re: Re: Searching with MySQL (and REGEXPs)
by gav^ (Curate) on Apr 02, 2002 at 05:30 UTC
    MySQL's fulltext isn't too useful (to me anyway) before version 4 as you can't change the minimum word length to something bigger than 4 without a recompile. It also doesn't like '-' which means for this client you couldn't search for part numbers.

    I didn't go for one of the modules on CPAN as I wanted something simple, the text search is in addition to 4 combo boxes (finish, material, mfg, and size in this case) and most users either leave it blank or just enter 1 or 2 words. I've got keys on the important fields and searching through 5,000 records takes < 0.2 seconds for a complicated query with 5 LIKEs and 2 REGEXPs.

    I don't think this will scale fantastically for larger data sets where using some kind of fulltext index will really pay off.

    Just for kicks I ran a quick test on a table with 550,000 rows, a simple LIKE/REGEXP takes 3.5 seconds while a REGEXP takes 12.

    gav^

      Recompiling MySQL doesn't sound like a big deal to me, and there has to be a way to sneak around the "-" limitation (convert it to a letter sequence?), but you may have found the sweet spot for satisfying this requirement. If it's fast enough then you're done, and you know you can always do one of these things later.

      Inverted index searches do scale very well, since they are basically just one hash lookup per keyword being searched. I've sometimes taken advantage of this speed by encoding fields like mfg and material as special keywords. It scales well for a large data set.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://155909]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-06-21 20:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.