http://www.perlmonks.org?node_id=752785


in reply to Re^2: PSQL and many queries
in thread PSQL and many queries

That's assuming the key/value pairs are relatively small and unique.

No values are required for lookup purposes. And the length of the keys makes surprisingly little difference to the size of the hash.

For example: For a hash with 11,881,376 keys:

Given the OPs description of the application: "I have a list of names in a plain text file, one per line. I also have names in a PSQL database field. I need to filter out name from the file against the database. ", I'm not sure where you think "uniqueness" comes in?

And given the OPs select statement, if duplicate names exist in the DB, they will be ignored, as there would be no way to determine which was matched.

(Remember that names can have 'special characters' or even a different character set.)

As far as I am aware, Perl's hashes handle unicode keys with aplomb.

I agree with the other posters about performing a bulk load and using appropriate indexes. The optimizer in PostgreSQL (or any modern DBMS) is better suited for these types of data matches IMHO.

In the time it will take you to create the temporary table and bulk load the data, Perl will have finished.

And that's long before you will have:

  1. built your "appropriate indexes";
  2. done your join and extracted the union or intersection of the two datasets (the OP doesn't say which he is after);
  3. and either a) transported them back to the calling program; or b) output them to a file.
  4. Cleaned up (discard) the temporary tables and indexes you constructed.

Don't believe me? I'll do it my way if you'll do it yours and we can compare. I'll let you write the test sets generator.

It might be a different story if, for example, the requirement was for the union of the two datsets to end up in the DB. But in that case, the simplest thing would be to just insert all the records from the file into the DB. There would be little point in individually testing whether they already existed before adding them.

Modern RDBMSs are very good for those things for which they are designed, but this kind of simple lookup isn't it.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.