Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Good afternoon, MrCromeDome. It seems you are having the same sort of problem as do a lot of search engines... needing to save cycles on many repeat queries. There are a lot of approaches to it, and it really is going to depend on the result set size, what hardware you have it on, etc. I'm going to assume that you have mySql as a database engine, as these tips will hold true for many many DBMs out there.

  1. First off, consider this, what is the upper limit size on your result set. Are you looking at 100s, 1000s, more? Because at a certain point, these caching mechanisms are not really going to help you. You'll have to do some performance tweaking to see what size category the bulk your results end up.
  2. Secondly, consider how often your data changes? This is essential in determining when your cache should drop dead, and the next search should store new data. If you have rapidly changing data (a good amount entered new each day), consider a cached time of perhaps 4 hours, or do it on off cycles, such as overnight.
In mySQL, whenever you request a large result set that has an "ORDER BY" statement in it, mySQL has to at least analyze the entire result set, then apply the limiter to it, to get the correct results. This can be quite tedious on the server, as you often pass over tons of results that you never pass back to the app. One way of caching is to get and hold all of the results that mySQL will come back to you with, and store them someplace fast to look them up in, such as another table.

A proposed table would take the form:
  • result_id = primary key int not null
  • dropdead = datetime, a future time in which to kill the result
  • searchstring = char(255) of search results.
  • resultcache = blob / text (or largeblob, etc)
Basically, you'd store the "SMITH JOHN" in the search string, and then form your result set in the resultcache, in some way that

Storage caching mechanisms can take two forms.
  • Binary column type, Storable, and freeze / thaw This method compresses the best, uncompresses fast, but is binary, and may be harder to debug.
  • A text data type (see the blob link above), Data::Dumper, and Data::Dumper->Dump / eval to store and then revive the data. This approach will store fewer results, is not as fast other approaches, but is easier to debug.
If you take a result set, and store it as an array, you can thaw (or eval), the entire result set and jump to a certain spot in the set. To look at this in psuedoperl:
my @list_of_cached_results; if($indatabase){ foreach my $result in ($search_result) { push @list_of_cached_results, result; } my $binary_cache = nfreeze(\@list_of_cached_results); #store lists of binary results } else { @list_of_cached_results = thaw($indatabase); } renderPage(@list_of_cached_results, $offset);
Basically, if you use the perl persistance modules, and a database (such as the one you're already reading from), you can pick up a good amount of speed from picking up a pre-packaged set of results (a match to your search string), and then display it to the page, saving you the hell of finding all the results again. As long as you can manage results and not have them go "stale", then you'll be all set.

Hope this made some sense to you. If I don't seem clear, I'd be happy to clarify anything in here.


In reply to Re: Database searches with persistent results via CGI by JayBonci
in thread Database searches with persistent results via CGI by MrCromeDome

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others pondering the Monastery: (3)
    As of 2020-02-17 06:34 GMT
    Find Nodes?
      Voting Booth?
      What numbers are you going to focus on primarily in 2020?

      Results (70 votes). Check out past polls.