Using large result sets from a database

MrCromeDome has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Using large result sets from a database by knobunc (Pilgrim) on May 09, 2001 at 18:52 UTC
Tough problem. You did not say which database you are using. Postgres will let you specify a limit and an offset to the query so you can reissue the query but get only the next set of results (assuming nothing has changed). I believe mysql has similar features. And you can do it in Oracle but it gets very ugly. So if you want to restrict yourself to standard SQL you are out of luck on the DB side. On the server side you have more control, but it is still pretty ugly. Are you using any particular framework (mason, mod_perl, ...) on your server side? If you have persistent server side state you can get the whole dataset the first time, bung it into your state and then paginate it for the user when they ask for it. This avoids problems where the data changes underneath you since you will just page through a static view but your data can become stale. It is cheaper if you expect your users to look at most of the data, but if they are only going to look at a page or two, it gets expensive. Plus, you then have to clean up the state sometime, and deciding when is tricky. What I do in this case is to reissue the query each time and loop through the results only storing the ones I care about: `my $first_row = 500; my $window_size = 50; my $location = 1; while (my ( $col1, $col2) = $sth->fetchrow_array() ) { next if $location < $first_row; last if $location >= $first_row + $window_size; # Do whatever I need to do with the data } continue { $location++; }` [download] Which gets pricey if they plan on looking at all of the results, but I have hundreds of thousands to millions of results to show (in some cases). So I can't slurp them to my side and in most cases they only want to see a page or two. -ben	[reply] [d/l]
Re: Re: Using large result sets from a database by MrCromeDome (Deacon) on May 09, 2001 at 19:05 UTC
I'm sorry for leaving out some of the important info: (Unfortunately) my DBMS is MS SQL Server 7.0. I don't like that, but unfortunately it's what I'm stuck with. It's hard being the UNIX guy (err, the voice of reason) in a Win32 shop. . . but anyway, I'm using DBI-ODBC to connect, and I have Apache 1.3.19 with mod_perl 1.25. Thanks for the info :)	[reply]
Add a where clause to bound the return each time. by hackmare (Pilgrim) on Jun 07, 2001 at 20:44 UTC
Something you can also try, but which is more cumbersome, is to track the key values for your table if you've got sequential or orderable keys, or alternatively add a counter in a subquery and select for the range within that subquery that you pass the server within the query. As you're using MS SQL Server, look into the SYBASE documentation since they are 90% the same product. My guess is that they have this already taken care of using LIMIT or an equivalent. --hackmare	[reply]
Re: Using large result sets from a database by andye (Curate) on May 09, 2001 at 18:55 UTC
Hi there. This is a very good question, which I came across a few days ago in a book I'm reading, Practical Oracle 8i, by Jonathan Lewis - so far it's an excellent book, although I'm only half-way through. Anyway, he says: (why this is easy in a normal client-server setup) With an Internet connection though, the customer comes back to the system and asks, Remember me? I was here a little while ago and asked for some data. Can I have the next bit please? And Oracle says, No, you terminated the session so I closed the cursor. I can't fetch the next set of rows. So what's the solution in the Internet environment? Option 1 is for the front end to know how many rows it has received so far and to keep repeating the quesry, possibly changing it slightly each time. For instance, on the first call it says, Get me the first 20 rows. The second time it says, Get me the first 40 rows and discard the first 20 rows (...snip - you get the idea) Option 2 is to write all the results into a permanent table on the first call, and add a spurious line number and a meaningless ID (passed back as a cookie perhaps) to each row written, then the second call will say, Get the stored result lines 21 to 40 for cookie code XXX(...snip) He points out that both approaches are heavier on the server than a traditional client/server approach. If anyone has a good, efficient solution to this problem, I for one would be v. interested! andy. (I suppose if you could didn't mind a long wait before the first row appeared to the user, you could pass the whole results set in one go and use Javascript to only show part of it to the user at any one time. Pretty impractical though). update: It's just occurred to me that what you really need is a persistent cursor between requests. I wonder if it would be possible to hold the statement handle between requests, then you could fetch_arrayref() say 20 times, show the page, fetch it the next 20 times for the next request, etc. merlyn has two columns that might be relevant.	[reply]
Re: Using large result sets from a database by grinder (Bishop) on May 10, 2001 at 00:41 UTC
I don't how feasible it is, but HTML::Pager might be worth exploring. I've never used it myself but it's something I have kept in the back of my mind for just this sort of problem. My questioning the feasibility is that I'n not sure how you crack the persistency problem, though, in terms of how you write the callback that is required by the HTML::Pager object to fill the page template. This hinges on the problem of keeping the database connection open (assuming you're not planning on using mod_perl, in which case it would be a snap). -- `g r i n d e r`	[reply]
Re: Using large result sets from a database by jeroenes (Priest) on May 10, 2001 at 10:00 UTC
You could use a cache (like Squid) to keep already issued pages for a while. This will keep the load from your webserver. Be sure to only cache the results of the large result heap, and not the pages you want to be interactively, or be sure there is a state variable in your CGI-string. Another way would be to create a view in the database with the results, adding a sequential numbering. Use that numbering to fetch a slice of the data. Hope this helps, Jeroen "We are not alone"(FZ)	[reply]
Re: Using large result sets from a database by hackmare (Pilgrim) on May 10, 2001 at 18:22 UTC
A simple way to deal with this is to dump the results to file using CSV.pm and then talk to that file. We did that (at a banking ASP) with success. The appeal of this is threefold: 1) Sucking data from a file is trivial and much quicker than rerunning a query. 2) The data now has state -all you need to do is pass around a filehandle 3) the data is static. Changes to the base data since load are not reflected, reducing user confusion.	[reply]


Your skill will accomplish what the force of many cannot
	PerlMonks