Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^4: Writing a database lookup tool

by elef (Friar)
on Jan 04, 2013 at 15:05 UTC ( [id://1011651]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Writing a database lookup tool
in thread Writing a database lookup tool

I used the "offline" concept to clarify for you that the data would not live "on a remote server over a slow network".
My questions are not vague or meaningless. At this point, the whole project is just taking shape, hence it obviously cannot have a full specification. Which is why I'm looking for guidance on what direction to take or whether the whole idea is feasible or not. This is clearly stated in the first post. I'm obviously not looking exact figures on anything.
Sticking with your example, a reasonable and helpful perlmonks user might answer: "If you want to build a piano with no experience in building a piano, no knowledge of wood and little knowledge of metal work, then be prepared that this project will take several years to complete - if you ever manage complete it. You would need to learn a lot and there are no 'piano building for dummies' guides to help you along. The idea is best abandoned." Alternatively, a helpful perlmonks user might answer: "You would need to learn how to use Solr but there's a decent tutorial and documentation at XXX. It can run offline and lookup times in the 0.1 - 1 sec range should be easily attainable. No need to learn other languages, you can put it together in perl exclusively. It would take me ~5 hours to put a basic working app together... I guess it shouldn't take more than a week even if you're completely new to databases."

Replies are listed 'Best First'.
Re^5: Writing a database lookup tool
by marto (Cardinal) on Jan 04, 2013 at 15:38 UTC

    "I used the "offline" concept to clarify for you that the data would not live "on a remote server over a slow network".

    However:

    "Re: Solr, it has a lot of the features I would want (optimized for text search, regex and sounds-like filters, hit highligting), but it looks like it's designed to run on a server, not offline."

    So you're assuming that Solr can't run on a laptop for some reason? Again, your offline concept is wrong, in the way you use it and what you seem to think it means.

    "My questions are not vague or meaningless."

    Your questions are very vague, e.g. (emphasis added by me)

    • "What sort of performance can I expect from whatever database engine I would end up using?" - No database platform specified.
    • "How much time would it take to import 8GB of text into a database format and how much space would it take up?"
    • "Most importantly, how much time would a lookup take on a run-of-the-mill laptop?" - No specification at all.
    • "Could the whole app be packaged up into a reasonably-sized .exe file with PAR::Packer?" - Subjective
    • "How much time would it take for those 1000 hits to be found if the database design and implementation is not particulary well optimized?" - Which database, how poorly optimized?

    At no point did I say they were meaningless**. IMHO a "reasonable" person would suggest you actually spend some time trying some of this out using different databases on the system you intend to run it on. I suggested this here. How long it would take you to deveop such a system depends on you, how much you understand about the issues involved, how much time you're prepaired to spend. Given that you've looked at Solr and think it can't run on your laptop, investigations aren't going well so far. I wouldn't like to speculate how long it'll take you to develop a working system.

    Update: ** Ah, perhaps you interpreted me saying "..essentially meaningless it would be to give you a result of a query running on X million records within my tuned environment for a database platform you'll never use." as somehow being a slight against you or your questions. If so please re read and understand that it would be meaningless for me to provide an arbitrary metric.

      Again, a reasonable and helpful person might answer:
      "- I used database module X in this and that manner for a somewhat similar project with a database about the same size and I saw lookup times between 0.1 sec and 0.3 sec on XYZ hardware out of the box. From what you describe, your lookups should be in the same ballpark."
      Or: "- If 20MB including the DB engine and the TK gui is reasonable for you, then yes. PAR::Packer should have no problem packing the DB module XXX, I've done it before. It can't pack DB module YYY, though, so don't use that if you need to pack the whole thing into an exe with PAR::Packer."

      BTW "run-of-the-mill laptop" is a meaningful performance spec. It can be reasonably assumed to mean something along the lines of a mid-tier corei3 or i5 with the integrated video card, 4GB of DDR3 RAM and a 5400 RPM 2.5" platter drive.

        "BTW "run-of-the-mill laptop" is a meaningful performance spec. It can be reasonably assumed to mean something along the lines of a mid-tier corei3 or i5 with the integrated video card, 4GB of DDR3 RAM and a 5400 RPM 2.5" platter drive."

        Under what circumstance is it safe to assume that the term "run-of-the-mill" in this context equates to a multi core 64bit CPU? You seem determined to validate the use of assumptions based on arbitrary phrases.

        "Again, a reasonable and helpful person might answer: "- I used database module X in this and that manner for a somewhat similar project with a database about the same size and I saw lookup times between 0.1 sec and 0.3 sec on XYZ hardware out of the box. From what you describe, your lookups should be in the same ballpark." Or: "- If 20MB including the DB engine and the TK gui is reasonable for you, then yes. PAR::Packer should have no problem packing the DB module XXX, I've done it before. It can't pack DB module YYY, though, so don't use that if you need to pack the whole thing into an exe with PAR::Packer."

        You keep providing pointless responses like this, yet not actually addressing any of the pertinant points raised in response to your posts. You have experience of pp, in the time it's taken you to post this you could have discovered the generated exe size for a basic script reading in Tk and several different database drivers (for comparison since you have yet to choose a database platform). In short, you could answer this yourself in a matter of minutes if you were actually concerned with the answer, rather than semantical arguments.

        We @ work use PostgreSQL and Oracle via DBI for a bit different data: lots of rather short strings with many relations among them (around 1 GB in XML). Simple queries take less than 1s, complicated ones might take more than 10 minutes. Go figure.
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1011651]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-03-19 02:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found