I'm finding that most databases are not happy with these sizes. I use java on the server so I don't know the perl equivalant, but what works for fast access (web application) is to leave the tsv alone, then build binary index files for the queried fields.
Each column gets a subdirectory, and each value gets a file which is a list of 64 bit numbers into the original tsv for the corresponding record for that value.
If your filesystem can handle it (ext3?) then this works for lots of unique values, and even range searches, and of course you can sort the filenames to get the results back in a certain order.
Multi column queries are handled by using the intersection(AND) or union(OR) of a list of pointers. Putting a little effort into figuring out which column/value is the smallest for a starting point helps with AND.
Once you have your final list of pointers, you can use randomaccessfile and fetch the corresponding records quickly and add them to the response.
I know this sounds like building from scratch, but search engines use a similiar technique. And I have spent far less time doing it the right way than softening my head on various dbms's and related nuances. And it is very memory friendly and fast. This works well for query applications, and I rebuild 7 column indexes on a 20gig tsv file and it is good to go, so putting up an updated tsv is fairly trivial too.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|