http://www.perlmonks.org?node_id=1011667


in reply to Re: Writing a database lookup tool
in thread Writing a database lookup tool

Thanks, NetWallah.
I've heard about the firefox squlite manager addon before and took a mental note to try it sometime. I might try that. It looks like it won't be able to handle anything like my full dataset or even a tenth of it, but I could use it to play with databases.
Do you have any input on the pros and cons of sqlite+DBI (or any other perl module) versus Solr? For instance, how about portability? Can either be packed into an exe with par packer? Would every user need to install sqlite or other software separately?

Replies are listed 'Best First'.
Re^3: Writing a database lookup tool
by NetWallah (Canon) on Jan 05, 2013 at 05:04 UTC
    I don't think you will be limited by Sqlite's database size capabilities - see my reply to marto above.

    FYI - current versions include FTS (Full text search) capabilities.

    Sqlite and perl run on every platform I have heard of.

    I'm not familiar with Solr - it certainly looks interesting - and may map better to your problem domain. However, I got the impression you wanted a stand-alone (not web-server based) solution. Sqlite may be better for that option. Yes - you will need to install each stand-alone instance, but installation is trivial - just copy the file in. Also - it is possible for multiple sqlite clients to access a single shared sqlite database file.

                 "By three methods we may learn wisdom: First, by reflection, which is noblest; Second, by imitation, which is easiest; and third by experience, which is the bitterest."           -Confucius

      "FYI - current versions include FTS (Full text search) capabilities."
      Thank you for that info. Full text search looks like it was designed for precisely the type of queries I'd be using. Based on the descriptions I found, it would add a lot of functionality and a lot of speed. Now, the main question is: do I get FTS with the perl database modules? The DBI::DB and DBD::SQLite cpan pages don't mention FTS, but it seems to be a pretty old feature so it should have trickled down, right?

      Edit: I digged around a bit more and found out that FTS is supported by DBD::SQLite: http://blogs.perl.org/users/adam_kennedy/2012/05/next-dbdsqlite-to-be-released-in-early-june.html

      "However, I got the impression you wanted a stand-alone (not web-server based) solution."
      Yes. The database and the lookup tool would be on the "client" machine, and ideally the whole thing should be reasonably compact and self-contained. One not too large (hopefully <50MB without the data) download, one not too complex installation.
        Update: I started playing with DBD::SQLite. I imported 800,000 records into a db, and quickly realized that by default, most of my searches are indeed carried out as sequential searches, which makes them hopelessly slow. With FTS enabled, average lookup times fell from 4 seconds to 0.04 seconds. If this scales in a roughly linear fashion and I get 0.5 sec lookups on 10 million records, I'll be very happy with the speed.
        Thanks again.
Re^3: Writing a database lookup tool
by marto (Cardinal) on Jan 04, 2013 at 16:21 UTC

    "Do you have any input on the pros and cons of sqlite+DBI (or any other perl module) versus Solr"

    Apache Solr isn't comprable to SQLite/DBI. It's not a RDBMS, why are you trying to make comparisons where there are none to be made?

    "Can either be packed into an exe with par packer?"

    DBD::SQLite can be packaged with pp. If you'd actually read what Solr is you wouldn't consider asking if it could be packaged. It simply doesn't make sense to do so.

    "Would every user need to install sqlite or other software separately?"

    No.

      "Apache Solr isn't comprable to SQLite/DBI. It's not a RDBMS, why are you trying to make comparisons where there are none to be made?"
      I don't know, perhaps because the two of them were suggested as alternative means of achieving the same end?

      "DBD::SQLite can be packaged with pp."
      Thanks.

      "If you'd actually read what Solr is you wouldn't consider asking if it could be packaged. It simply doesn't make sense to do so."
      There is a Solr CPAN module. I would have thought that it's reasonable to ask whether it is self-contained or requires that solr be installed separately, and whether it can be packaged or it has un-packable dependencies. I guess that using Solr would mean that every user needs to install Lucene/Solr.

      "Would every user need to install sqlite or other software separately? - No"
      I guess that only applies to DBD::SQLite then.

      If you are annoyed about beginner questions in a thread that explicitly states that it is about beginner questions and feeling around for possible options/feasibility, there is nothing wrong with just not posting in it. If you decide to try and help, there is no reason to get irritated.

        "I don't know, perhaps because the two of them were suggested as alternative means of achieving the same end?"

        I suggested it as a pre built solution for basing a tool on, rather than inventing a tool. At no point did I suggest they did the same thing. Because you can use a product or build your own to achieve a goal does not mean they're the same. You're jumping to conclusions again rather than researching the suggestions already given.

        "There is a Solr CPAN module. I would have thought that it's reasonable to ask whether it is self-contained or requires that solr be installed separately, and whether it can be packaged or it has un-packable dependencies. I guess that using Solr would mean that every user needs to install Lucene/Solr."

        The Solr module documentaion explains what it is. Had you looked there and in the Solr documentation is then you'd know the answer. Is there anything stopping you actually researching these questions before posting?

        " If you are annoyed about beginner questions in a thread that explicitly states that it is about beginner questions and feeling around for possible options/feasibility, there is nothing wrong with just not posting in it. If you decide to try and help, there is no reason to get irritated.

        I'm not in the slightest annoyed. I find it curious however that you don't seem to want to do any research yourself, simply relying on the advice of others while making crazy assumptions about things rather than actually read what they do.

        Update: For example here you spend time, again stating what you expect a reasonable person to provide when it'd take you less time to build test cases and generage actual output for use as a metric.