Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Large Constant Database in Text File

by mattr (Curate)
on Sep 05, 2006 at 07:47 UTC ( [id://571205]=note: print w/replies, xml ) Need Help??


in reply to Large Constant Database in Text File

15MB is not a lot of data, <1 sec. response seems possible (as another poster notes) if it uses preprepared indices. Possibly even without an indice, the pure C searching within a database may be in that range. You are losing time with I/O; I'd be surprised a regex-based search on data that is already in memory even takes as long as you say.

Anyway it is true that dbs have limited full text search functionality, what you are asking for is a LIKE (or wildcard) search, plus maybe boolean operators. It will be a lot easier to use a db, really.

On the other hand, I've searched 1GB of data without a relational database in 0.1 seconds (using C++ based htdig behind a mod_perl wrapper). I've searched 10 megabytes of data with a single index and regex in about 1-2 seconds too and that was on a 133MHz P2 IIRC.

Typically these speeds are achieved without an rdbms by precompiling inverted indices (hashes) on the columns (keys) in which you are interested. For wildcard searches I have seen a technique that builds a hash including all substrings of every word. In reality though the maintenance of these inverted indices is a pain (they have to be rebuilt periodically, and often you end up trying to tweak mysterious parameters to improve performance.. also sometimes no wildcard support).

So I'd also recommend a database, if you can get one, but if not then yes for the scale you are talking about you ought to be able to get far better performance than now with the use of precompiled (and periodically updated) indices, maybe just using standard perl data storage modules. But don't step through your current text files a line at a time, that is the job your index generator will do every night.

  • Comment on Re: Large Constant Database in Text File

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://571205]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-04-23 20:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found