I am not actually reading all the data into Perl, just the smaller of two
tables in an SQLite DB (one has 40000 records, the other 26 million).
Profiling with Devel::NYTProf indicated that around 50% of the time
was spent doing SELECTs on the database, so that looked like a good starting
point for optimisation. As I have access to a cluster with available memory
varying between 18-90 GB per node, I thought it might be worthwhile
investigating whether some of the operations could be performed in memory,
especially as I currently only require two such look-up tables.
The idea is that with the look-up tables I can create them once and thereby
effectively store the results of multiple SELECTS, which can then be accessed
via the hash key. This seemed like it might be faster in Perl, but as you
point out, a single SELECT is what the DB is optimised to do. So I guess my mileage