Searching large files before browser timeout

aijin has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Searching large files before browser timeout by tachyon (Chancellor) on Jun 13, 2001 at 00:54 UTC
One way to avoid the timeout would be to immediately send a results page to the browser. We cheat a bit. First generate a unique temporary page temp12345.htm where we will write our results when they become available. Next send a 307 Temporary Redirect header back to the browser that points to this temp12345.htm page. This temp page will then appear in the users browser window. Some text like: We are processing your job, please click refresh now and again to see it your job is complete! will inform the user of what is happening. You are then free to process the job. All you then need to do is write the result to your temp12345.html page and then when the user next presses refresh - voila the result. No timeouts. Until the results are written the user just gets the same please wait page with each refresh. For elegance you could add a meta refresh to refresh the page every 10 seconds or whatever, then the user does not even need to worry about the refresh. When you write the result you dump the auto refresh so the final page does not keep reloading. merlyn explains the whole thing in detail (and with code) here Hope this helps tachyon	[reply]
Re: Re: Searching large files before browser timeout by John M. Dlugosz (Monsignor) on Jun 13, 2001 at 01:00 UTC
You can specify a reload in the header ("client pull") rather than needing any client-side script.	[reply]
Re: Re: Re: Searching large files before browser timeout by graq (Curate) on Jun 13, 2001 at 15:45 UTC
From experience, I would be wary of relying on browser refresh (whether manual or not). Web page caching occurs in so many hops between host and client, and varies in behaviour between - not only - browser type, but also browser version. Expect some visitors to experience unwanted behaviour from a setup like this. Unfortunately I know of no way to avoid it, other than not to rely on browser refresh. -- Graq	[reply]
Re: Re: Re: Searching large files before browser timeout by tachyon (Chancellor) on Jun 13, 2001 at 01:08 UTC
Agree. Getting tired. And Sloppy. Corrected. Thanks. tachyon sleep(28800);	[reply]
Re: Searching large files before browser timeout by shotgunefx (Parson) on Jun 13, 2001 at 00:43 UTC
I had a similar problem and there a couple ways of approaching it. I opted for turning off buffering and printing a "Pleast wait" message and emitting a "." or similar every thousand or so records so they wouldn't hit refresh and spawn another copy of the process. Then when it was finished I displayed the results. You could also fork the actual search off as another process and display a "searching.." page with a meta-refresh or server-push, this way you could handle problems with impatient people spawning lots of processes. You might also be interested in this related node. -Lee "To be civilized is to deny one's nature."	[reply]
Re: Re: Searching large files before browser timeout by BatGnat (Scribe) on Jun 14, 2001 at 04:04 UTC
Very Simple!, Very Effective! And my choice in solving this problem fo my own pages. BatGnat BALLOT: A multiple choice exam, in which all of the answers above are incorrect!	[reply]
Re: Searching large files before browser timeout by WrongWay (Pilgrim) on Jun 13, 2001 at 00:21 UTC
Without seeing any code I would say you have 2 options. 1. Build a queing system, where all the work is done by a perl cronjob, and the user can keep refreshing a que status page till his/her job is completed. 2. Pre-split/sort your file(s). This should allow a quicker way to search. 80mb is pretty hefty, Maybe 40 2mb files would be better. Just my $.02 worth. WrongWay	[reply]
Re: Searching large files before browser timeout by LD2 (Curate) on Jun 13, 2001 at 00:34 UTC
The most common advice here at the Monastery is to check Super Search first, before posting a question. Here is a node that may help you...Browser Timeout	[reply]
Re: Searching large files before browser timeout by Davious (Sexton) on Jun 13, 2001 at 09:23 UTC
We have an application where we needed to parse 100+ meg log files in real time for various strings and I had the same problem. I found that I was able to get a significant speed boost by offloading the string matching aspect to unix grep and piping the output into perl. It cut the time down from several minutes to under 30 seconds. `$cmd = qq\|grep '$string' access.log\|; open(LOG,"$cmd\|"); while (<LOG>) { # etc..` [download]	[reply] [d/l]
Re: Re: Searching large files before browser timeout by aijin (Monk) on Jun 14, 2001 at 00:49 UTC
This works great, thank you! I just benchmarked searching through a smallish file, using Perl pattern matching and grep. Benchmark: timing 10000000 iterations of Grep, Perl... Grep: 19 wallclock secs (16.38 usr + 0.03 sys = 16.41 CPU) Perl: 101 wallclock secs (80.91 usr + 8.11 sys = 89.02 CPU) What a difference!	[reply]
Re: Re: Searching large files before browser timeout by sierrathedog04 (Hermit) on Jun 13, 2001 at 20:09 UTC
Having UNIX grep rather than Perl grep do the searching would usually slow your program down. Perl grep is usually faster than UNIX's grep but slower than UNIX's egrep. Of course, YYMV.	[reply]
Re: Re: Re: Searching large files before browser timeout by Davious (Sexton) on Jun 13, 2001 at 21:23 UTC
Hmm, well in my case it was blindingly faster. Keep in mind I wasn't searching for anything more complicated than a fixed string (ie: '127.0.0.1') not a regexp or anything of that nature.	[reply]
Re: Searching large files before browser timeout by mattr (Curate) on Jun 13, 2001 at 11:19 UTC
One industrial-strength way is to fork off a child which does the processing while the parent keeps the browser from timing out by printing spaces, periods, or intermittent status messages. You need to have the parent set output autoflush on ($\|=1). You also could do it without a child but use alarms, as mentioned in the timeout discussion mentioned above. If you can get the child to send messages to the parent thread during processing, those intermittent status messages could be more interesting. I am thinking of doing this for a similar problem we talked about recently at the monastery.. some message passing pipe or possibly IPC may be useful for this. One idea, if you are going to have a ton of files to be processed maybe you want to have one server which just searches all these files, doing the optimization, scheduling, and sorting you need done, and have cgi processes talk to the server. That way you might be able to allot more cpu to the processing daemon. But you might get similar timeout issues.	[reply]
Re: Searching large files before browser timeout by John M. Dlugosz (Monsignor) on Jun 13, 2001 at 00:55 UTC
Use "server push" technology to send a status report before the final answer. The very existance of that feature should stop the browser from timing out.	[reply]
Re: Re: Searching large files before browser timeout by $code or die (Deacon) on Jun 13, 2001 at 06:05 UTC
I don't believe that "server push" is very portable. Pretty sure that IE doesn't support it anyway. $code or die `$ perldoc perldoc`	[reply]
Re: Searching large files before browser timeout by Anonymous Monk on Jun 14, 2001 at 07:53 UTC
stat the file if it is "too big" fork off a background process that will e-mail the results	[reply]


XP is just a number
	PerlMonks