Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

keeping connection alive while spending time building a zip file

by Scags9876 (Novice)
on Aug 10, 2011 at 21:20 UTC ( #919746=perlquestion: print w/ replies, xml ) Need Help??
Scags9876 has asked for the wisdom of the Perl Monks concerning the following question:

I have a problem where a client can send an https request that will build a zip file and send it to the user. Sometimes these zip files are very very large. It can take upwards of 10 minutes to build it. In the meantime, the connection is severed by the client's firewall (or some such mechanism).

In the past, I have solved this problem by sending a keepalive to the client in the form of:

$r->print(" "); $r->rflush();

This works well when the headers are already sent and the content type is text/html or text/plain (while processing an upload for example, prior to showing any results of processing). However, this does not work when the content type is 'application/zip'.

How do I send some keep alive information to the client while the file is building when the eventual content type of the response is 'application/zip'?

Here's what I'm doing, in a nutshell, which is not working:

my $r = Apache2::Request->new(); while($file_is_not_finished) { ... write a bunch of stuff to $file $r->print(" "); $r->rflush(); } $r->content_type('application/zip'); $r->headers_out->set( 'Content-disposition' => "attachment;filename=\"file.zip\"" ); my $fh; open($fh, $file) or die $!; $r->print(<$fh>);

Sorry if my pseudo code is not clear, let me know if further clarification would be helpful.

The observed behavior is that the client browser just ends the connection, with no message or indication of error.

Comment on keeping connection alive while spending time building a zip file
Select or Download Code
Re: keeping connection alive while spending time building a zip file
by thenaz (Beadle) on Aug 10, 2011 at 21:39 UTC

    You could serve up an HTTP 302 that redirects the client to the real zip file once it's created. Since the contents of the 302 message are ignored, you can use your $r->print(" "); $r->flush(); method to keep the client waiting.

Re: keeping connection alive while spending time building a zip file
by pmqs (Monk) on Aug 10, 2011 at 23:18 UTC

    Can you give more details about how the zip file is created? Does it just consist of files that already exist in the filesystem? Or does your code need to create the contents on the fly by carrying out expensive calculations or retrieving data from another source?

    The key question is how much of the 10 minutes is taken up actually doing the file I/O needed to write to $file?

    If the answer is that the 10 minutes is accounted for by file I/O, then a possible approach is to use a combination of HTTP chunking and streaming the zip file direct to the client as you create it.

    Info-Zip can create a zip file in streaming mode, as can IO::Compress::Zip. Don't know Apache2::Request at all, so I can't comment on whether it supports chunking.

Re: keeping connection alive while spending time building a zip file
by Anonymous Monk on Aug 11, 2011 at 01:35 UTC
Re: keeping connection alive while spending time building a zip file
by strredwolf (Chaplain) on Aug 11, 2011 at 02:02 UTC
    Short of timed-redirects, you need to off-load the ZIPping of the files to a separate process that you can fire off (say, a Unix socket based server of your own design in Perl). Then, use Javascript on the client and a small bit of AJAX to send up status info. When it's done, have the Javascript redirect to the file.
    Information doesn't want to be free. It wants to be feline.

      What if Javscript is not available (links, lynx, wget, curl, LWP::UserAgent) or blocked (NoScript, ...)?

      Some people, when confronted with a problem, think "I know, I'll use Javascript." Now they have two problems.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        If you use fork, and build the zip file with the child process, then the parent process can redirect to a "status" page. The status page will either display "working..." or "not done yet" while the zip is being built, or else display a clickable link to the zip once it's done. The ajax/javascript version of the page will automatically refresh the status every few seconds. If javascript is not available, you could try using a meta tag refresh, or else instruct users to click "reload" periodically (or put a message like "check back later" inside a noscript tag).

Re: keeping connection alive while spending time building a zip file
by sundialsvc4 (Abbot) on Aug 11, 2011 at 13:56 UTC

    This is, for many reasons, an ideal application for a “job scheduling” system ... running separately from the web server.   Let the web-site become a user interface for that system.

    (There are, of course, plenty of things to choose from in CPAN and elsewhere.)

    Let’s say that one hundred innocent web-site users happen to, at the same instant, try to build zip-files.   Under your present arrangement, completion of the task by the now-hopelessly-swamped computers could take the better part of a day.

    Per contra, those requests could instead cause 100 batch-jobs to be generated, and submitted to a system which will process no more than (say...) 10 jobs simultaneously, with a therefore-assured completion time of no more than (say...) 11 minutes per request worst-case.   Our 100 users now see, variously, that “your request should be ready in about 11 minutes” to “about an hour.”   Maybe they get an e-mail when it’s done.   Maybe they have the ability to decide to cancel their request.   And so on.   Meanwhile, the system(s) who are doing the work will not commit themselves to more simultaneous workload than you know they can deliver consistently:   there is a throttle, a governor, and a queue.

    The mere fact that a web-server can be used to request or to initiate something, does not mean that it should also be the one to do it.   In fact, this is (IMHO...) almost universally a bad idea, very-common though it be.

Re: keeping connection alive while spending time building a zip file
by TomDLux (Vicar) on Aug 11, 2011 at 15:05 UTC

    sundialsvc4 has beaten me to it, but worth reinforcing the concept.

    Browsers are inherently an interactive medium: you request something, you get something, repeat as necessary. Well you can stretch the concept a little but sometimes things just don't fit that pattern. In that case it's better to forget about delivering a request synchronously, and just turn it into a queued request.

    As Occam said: Entia non sunt multiplicanda praeter necessitatem.

Re: keeping connection alive while spending time building a zip file
by stonecolddevin (Vicar) on Aug 11, 2011 at 16:34 UTC

    How about having a backend process take care of building the zip file and poll it from the web app to see how far along it is, then once it's done redirect the client to the file?

    Three thousand years of beautiful tradition, from Moses to Sandy Koufax, you're god damn right I'm living in the fucking past

Re: keeping connection alive while spending time building a zip file
by Anonymous Monk on Aug 11, 2011 at 21:53 UTC

    The only other approach that springs to mind is MIME multipart messages. I seem to remember that IE didn't support multipart/x-mixed-replace but some other multipart/mixed response might be workable.

    The right solution for you will depend on many factors like the number of simultaneous users, presence of javascript, support for multipart/x-mixed-replace, is your file system stripped raid, etc. That said, I generally prefer dhoss's recommendation of a backend process because you can easily control things like disk thrashing and RAM usage.

Re: keeping connection alive while spending time building a zip file
by Scags9876 (Novice) on Aug 12, 2011 at 13:24 UTC

    Thank you all for your prompt and clear replies.

    I had hoped there was some easy way around it, but I think the right way to go, as suggested by a number of you, is to have the file build in a queued, asynchronous manner, and send it to the client when it is done. This approach will take longer to implement, but it will be better in the long run for sure.

    thanks again.

      Did a quick proof of concept to see if I could stream a zip file straight to a browser while it was being created.

      The CGI script below is hard wired to create a zip file that contains the contents of two files, namely /tmp/file1 and /tmp/file2.

      use IO::Compress::Zip qw(:all) ; select STDOUT; $| = 1; my $OUT = \*STDOUT; print <<EOM; Status: 200 OK Content-Type: application/zip Transfer-Encoding: chunked EOM my @files = qw(/tmp/file1 /tmp/file2) ; zip [@files] => '-', FilterEnvelope => sub { # Chunk the output my $length = length($_); $_ = sprintf("%x", $length) . "\r\n" . $_ . "\r\n"; $_ .= "\r\n" unless $length; 1; } ;
      Whether this helps in your use-case is depends on where the 10 minute delay comes from. If the delay is CPU/network related, this may help a bit, but it probably won't solve the issue for you.

      Anyway, had created the code written, so thought I'd share it with you.

Re: keeping connection alive while spending time building a zip file
by sundialsvc4 (Abbot) on Aug 12, 2011 at 16:25 UTC

    I would suggest that the user issues the request for the zip-file.   He is told when to expect it ... he is notified, simply, that the work has been completed.   He can now log back on to the web-site to retrieve the results.   However he wants to get them.   Corporate e-mail systems, (ahem...) sometimes really suck, especially with large things, like, say, zip-files that take twenty minutes to produce.

    It really amazes me how many otherwise well-intentioned “shops” have forgotten all of the good lessons of “the batch-job days.”   Sure, no one needs to go back to //ZIPFILE JOB (123,456),'CREATE ZIP' (although those crufty old days certainly do still exist...), but the notion of doing “intensive” work in a non-interactive setting was, and still is, a very good one.   If a shop will actually embrace the notion, it can have a very dramatic difference on many aspects of the work flow.

    When a computer system becomes over-committed, the performance degradation that results is not linear; it is exponential.   It is “hitting the wall.”

    “What he means is Old Testament, Mr. Mayor, real wrath of God type stuff.   Fire and brimstone coming down from the skies!   Rivers and seas boiling!   Forty years of darkness!   Earthquakes, volcanoes...   The dead rising from the grave!   Human sacrifice, dogs and cats living together... mass hysteria!”

    And the only thing that you can do is, “don’t do that.”

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://919746]
Approved by Samy_rio
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (9)
As of 2014-11-29 01:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (200 votes), past polls