Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^5: Writing a database lookup tool

by marto (Chancellor)
on Jan 04, 2013 at 15:38 UTC ( #1011657=note: print w/ replies, xml ) Need Help??


in reply to Re^4: Writing a database lookup tool
in thread Writing a database lookup tool

"I used the "offline" concept to clarify for you that the data would not live "on a remote server over a slow network".

However:

"Re: Solr, it has a lot of the features I would want (optimized for text search, regex and sounds-like filters, hit highligting), but it looks like it's designed to run on a server, not offline."

So you're assuming that Solr can't run on a laptop for some reason? Again, your offline concept is wrong, in the way you use it and what you seem to think it means.

"My questions are not vague or meaningless."

Your questions are very vague, e.g. (emphasis added by me)

  • "What sort of performance can I expect from whatever database engine I would end up using?" - No database platform specified.
  • "How much time would it take to import 8GB of text into a database format and how much space would it take up?"
  • "Most importantly, how much time would a lookup take on a run-of-the-mill laptop?" - No specification at all.
  • "Could the whole app be packaged up into a reasonably-sized .exe file with PAR::Packer?" - Subjective
  • "How much time would it take for those 1000 hits to be found if the database design and implementation is not particulary well optimized?" - Which database, how poorly optimized?

At no point did I say they were meaningless**. IMHO a "reasonable" person would suggest you actually spend some time trying some of this out using different databases on the system you intend to run it on. I suggested this here. How long it would take you to deveop such a system depends on you, how much you understand about the issues involved, how much time you're prepaired to spend. Given that you've looked at Solr and think it can't run on your laptop, investigations aren't going well so far. I wouldn't like to speculate how long it'll take you to develop a working system.

Update: ** Ah, perhaps you interpreted me saying "..essentially meaningless it would be to give you a result of a query running on X million records within my tuned environment for a database platform you'll never use." as somehow being a slight against you or your questions. If so please re read and understand that it would be meaningless for me to provide an arbitrary metric.


Comment on Re^5: Writing a database lookup tool
Re^6: Writing a database lookup tool
by elef (Friar) on Jan 04, 2013 at 16:01 UTC
    Again, a reasonable and helpful person might answer:
    "- I used database module X in this and that manner for a somewhat similar project with a database about the same size and I saw lookup times between 0.1 sec and 0.3 sec on XYZ hardware out of the box. From what you describe, your lookups should be in the same ballpark."
    Or: "- If 20MB including the DB engine and the TK gui is reasonable for you, then yes. PAR::Packer should have no problem packing the DB module XXX, I've done it before. It can't pack DB module YYY, though, so don't use that if you need to pack the whole thing into an exe with PAR::Packer."

    BTW "run-of-the-mill laptop" is a meaningful performance spec. It can be reasonably assumed to mean something along the lines of a mid-tier corei3 or i5 with the integrated video card, 4GB of DDR3 RAM and a 5400 RPM 2.5" platter drive.

      "BTW "run-of-the-mill laptop" is a meaningful performance spec. It can be reasonably assumed to mean something along the lines of a mid-tier corei3 or i5 with the integrated video card, 4GB of DDR3 RAM and a 5400 RPM 2.5" platter drive."

      Under what circumstance is it safe to assume that the term "run-of-the-mill" in this context equates to a multi core 64bit CPU? You seem determined to validate the use of assumptions based on arbitrary phrases.

      "Again, a reasonable and helpful person might answer: "- I used database module X in this and that manner for a somewhat similar project with a database about the same size and I saw lookup times between 0.1 sec and 0.3 sec on XYZ hardware out of the box. From what you describe, your lookups should be in the same ballpark." Or: "- If 20MB including the DB engine and the TK gui is reasonable for you, then yes. PAR::Packer should have no problem packing the DB module XXX, I've done it before. It can't pack DB module YYY, though, so don't use that if you need to pack the whole thing into an exe with PAR::Packer."

      You keep providing pointless responses like this, yet not actually addressing any of the pertinant points raised in response to your posts. You have experience of pp, in the time it's taken you to post this you could have discovered the generated exe size for a basic script reading in Tk and several different database drivers (for comparison since you have yet to choose a database platform). In short, you could answer this yourself in a matter of minutes if you were actually concerned with the answer, rather than semantical arguments.

      We @ work use PostgreSQL and Oracle via DBI for a bit different data: lots of rather short strings with many relations among them (around 1 GB in XML). Simple queries take less than 1s, complicated ones might take more than 10 minutes. Go figure.
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Well, in the post you're replying to, I did give a fairly detailed description of a sample query I had in mind:
        "let's assume a there are 15 million records with a 100 characters in each (in the field that we're searching). I look up a 10-character string. There are 1000 hits. How much time would it take for those 1000 hits to be found if the database design and implementation is not particulary well optimized? 0.01 second? 1 second? 5 seconds?"

        Finding the records in which a 10-character string occurs in a given field would qualify as a "simple" query, I would think.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1011657]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2014-09-15 10:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (146 votes), past polls