Polyglot has asked for the wisdom of the Perl Monks concerning the following question:
I'm using the following subroutine to send a binary (PDF) file back to the client's browser.
# IMPORTANT MODULES FOR THIS CODE...
use CGI qw(-utf8);
use File::Spec::Functions qw( catfile );
sub send_file {
my ($cgi, $dir, $file) = @_;
# $dir = '/var/www/download/';
# $file = 'MyLaTeXDocument.pdf';
my $path = catfile($dir, $file);
open my $fh, '<:raw', $path or die "Cannot open '$path': $!\n";
$cgi->charset(''); #REMOVES PRIOR UTF-8 SETTING, AS THIS IS BINARY
+ FILE
print $cgi->header(
-type => 'application/octet-stream',
-attachment => $file,
);
binmode STDOUT, ':raw';
print while <$fh>;
close $fh or die "Cannot close '$path': $!";
return;
}
The browser console sees a string of characters returning in the response, but no download dialogue is opened.
Response headers
Connection
Keep-Alive
Content-Disposition
attachment; filename="MyLaTeXDocument.pdf"
Content-Type
application/octet-stream
Date
Fri, 22 Sep 2023 11:13:27 GMT
Keep-Alive
timeout=5, max=100
Server
Apache/2.4.52 (Ubuntu)
Transfer-Encoding
chunked
Response Payload
JVBERi0xLjUKJeTw7fg... [truncated...too lazy to type more]
Why won't the browser just open the "Save as..." dialogue? As it stands, the browser appears to do nothing, silently dropping this activity in background. What is lacking in this code?
Re: Perl output is not inducing file download as expected
by marto (Cardinal) on Sep 22, 2023 at 12:41 UTC
|
-type => 'application/octet-stream',
Shouldn't that be:
-type => 'application/pdf',
https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types | [reply] [d/l] [select] |
|
The "octet-stream" categorization came from the suggestion of multiple websites as being a designation sure to prompt the download dialogue in the browser. In my case, obviously, it didn't--but then, neither did "pdf" or "x-download" or anything else I had tried. See my full solution that I finally found in my own response (to be posted shortly).
| [reply] |
Re: Perl output is not inducing file download as expected
by afoken (Chancellor) on Sep 22, 2023 at 23:07 UTC
|
JVBERi0xLjUKJeTw7fg
That does not look like a PDF file. It should start with %PDF- (see PDF). What you get is base64-encoded "%PDF-1.5".
Transfer-Encoding chunked
That's a little bit unexpected. You send the entire file, so there should be no chunked encoding.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [d/l] [select] |
|
This observation was helpful. It helped me understand what might be happening. It's pretty tough to troubleshoot when one has no clear understanding of the problem. I had noticed that it lacked the PDF header that I was expecting, but it was your observation that helped me understand why. Thank you. As it turns out, this problem had nothing to do with the download's initiation, only with its quality during the transfer. For the full solution of what I found to work, and not work, see my own answer (to be posted shortly).
| [reply] |
Re: Perl output is not inducing file download as expected
by kikuchiyo (Hermit) on Sep 22, 2023 at 12:39 UTC
|
| [reply] |
|
So, I've added the Content-length header, but it's not necessary for the download to work. It enhances the download by telling the client how much to expect, and the client can then give a proper progress bar for the download; but the download can still initiate without this. See my added details with the full solution which I found (to be posted shortly).
| [reply] |
Re: Perl output is not inducing file download as expected
by Polyglot (Chaplain) on Sep 23, 2023 at 12:16 UTC
|
This subroutine finally resulted in a correct download.
sub send_file {
my ($cgi, $dir, $file) = @_;
my $path = "$dir$file";
$cgi->charset('');
#$|=1; ### THIS FOULS UP THE DOWNLOAD "SAVE AS..." DIALOGUE
$|=0; #SO LET'S BE EXPLICIT ABOUT SETTING THIS CORRECTLY!
my $fn;
open ($fn, "< :raw", $path) or die "Sorry, unable to open the file: $t
+empfile. $!\n";
binmode($fn);
my @document = <$fn>;
close $fn;
my $len;
$len += length $_ for @document;
print"Content-type:application/x-download\n";
print"Content-disposition:attachment; filename=$file\n";
print"Content-length:$len\n\n";
binmode STDOUT;
print @document;
return;
} #END SUB send_file
In place of the print @document; line above, either of the following two options worked equally well, but with more lines.
my $in_fh;
open($in_fh, "<", $path)
|| die "$0: cannot open $path for reading: $!";
binmode($in_fh) || die "binmode failed";
print while <$in_fh>;
close($in_fh) || die "couldn't close $path: $!";
OR...
my $BUFSIZ = 64 * (2 ** 10);
my ($in_fh, $buffer);
open($in_fh, "<", $path)
|| die "$0: cannot open $path for reading: $!";
binmode($in_fh) || die "binmode failed";
while (read($in_fh, $buffer, $BUFSIZ)) {
unless (print $buffer) {
die "couldn't write to STDOUT: $!";
}
}
close($in_fh) || die "couldn't close $path: $!";
There were likely multiple puzzle pieces which all had to be correct at the same time for this to work, and I was unlucky enough to have just one problem or another in my code at any given time to spoil it. One of the major ones was setting the STDOUT to 'binmode'. This prevented the garbled text. Another major one was not having the correct placement, and number, of newlines after the headers. And, for me at least, one of the big ones was the discovery that setting the print handle to "hot", i.e. $| = 1 would prevent the client from opening the download dialogue, with log errors claiming the headers had not been sent, even though I had explicitly printed them and had bypassed the CGI module for this.
I do not know what the purpose of buffering the download is in the final example, but I think this form would be useful where file sizes made slurping them fully into memory inconvenient. In any case, I found more than one way to do it.
As this is not the first time I have needed to do something like this, and as my earlier code of a similar nature seems to have been difficult to adapt to the needs of this new circumstance, I hope to remember this solution and find it more generally applicable. Thank you to those who gave helpful suggestions.
| [reply] [d/l] [select] |
|
One more thing that is still subtly wrong in your code is that you use the <> operator on a binary file, without changing the input separator $/. By using its default value (newline), you split your file on newlines and store its content in newline-separated chunks, which makes little to no sense for a PDF.
You could set it either to undef, which will cause the <> operator to slurp the entire file in one go, or you can set it to a reference to a suitable integer (e.g. local $/ = \32768;, which will cause it to read the file in fixed size chunks (this is useful when the don't want to, or can't read all of the content into memory).
See perldoc -v '$/' for a detailed explanation.
For a typical PDF setting it to undef and thus slurping the file would be the easiest, because then you wouldn't need the $len += length $_ for @document; dance to get your content length.
| [reply] [d/l] [select] |
|
I have a question. You have 4 print statements, and you turn on buffering. What happens when buffering is turned off is you send the header, some time goes by, and then you send the body. If you combined the 4 print statements into just one line, then buffering could be turned off or on, and it wouldn't make a difference. Am I wrong on this?
binmode STDOUT;
print "Content-type: application/x-download\n",
"Content-disposition: attachment; filename=$file\n",
"Content-length: $len\n\n", @document;
| [reply] |
|
The truth is, I became frustrated by the amount of confusion on this issue. I found not a single decent commentary online for how to print the headers manually, and the packages that printed them appeared to be printing more than I had specified when I checked the browser console--and I had doubts as to what else they might be doing under the surface. This is what led me to attempt to print them manually, so I would have full control over what got put into the header.
But the internet seemed to provide multiple instructions on what the headers required: specifically, how each line of the header should be terminated. I tried one thing after another--I'm sure I must have tried at least 40 or more various configurations before I found anything that worked--and, of course, much of that time it was unrelated to the headers anyhow, but I didn't know that yet (the hot print handle was messing things up, or the fork: {} that I was attempting to use may not have helped).
For example: some sites said that each line should end in "\r\n"--whereas I had been using just "\n". Was this something that the CGI package was "fixing" for me automatically? Did I need to add this manually? Another point of question was whether or not each line of the header should end in a comma, and whether or not this included the last line of the header, too. Perhaps the comma was just required by the CGI package, and not by the client. I searched in vain online for http header syntax. I found sites that claimed to say something about it, alright, but they focused on the headers themselves, not whether or not they should be case-sensitive, or how their lines should end, or anything else that I needed to know.
In the end, I found an answer online with two lines, printed as I posted in my solution except that the second one had the double newline, that worked! I then added only the third line--the Content-length. Seeing that it worked satisfied me. Its simplicity pleased me. I am certain it could still be improved, and originally I did have all of it in a multi-line quote to be printed at once--but that was back when things were not working. Once I got it all working, I tended to let it be as it was! So that's how there got to be multiple print lines. I do not, however, presume that it must be this way, nor that it would not be superior to combine them. But it works; as-is. And with that I am, for now, content.
| [reply] [d/l] |
|
|
Regarding the buffering, my understanding is that it is only to prevent having to slurp the entire file into memory (which I may be doing anyhow at this point, but it is not necessary to do so). For a large file, that buffering would come in handy so as not to need to consume so much RAM in the process. So I don't see any real connection between the buffering and the http headers. Now, this is merely my interpretation--someone here may be able to enlighten me as to the true purpose of the buffering.
| [reply] |
|
|
|