Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

csv file download size limits?

by ted.byers (Scribe)
on Nov 07, 2013 at 20:09 UTC ( #1061618=perlquestion: print w/ replies, xml ) Need Help??
ted.byers has asked for the wisdom of the Perl Monks concerning the following question:

I used your search engine and nothing since 1999 included all four keywords in either the text or title.

I used the following to create and send a csv file, from a specialized CGI script.

my $a_ref = $dbh->selectall_arrayref($sql); my @a = @$a_ref; print $cgi->header(-type=>'text/csv', -charset=>'utf-8', -attachment=>'data.csv'); print "Platform ID, Processor ID, Transaction ID, Date, Settlement Dat +e, Transaction type, Status, Amount, First Name, Last Name,Address1,A +ddress(cont.),City,State or Province,Postal Code,Country, Phone, Emai +l, Processor response code, MSC response code, IP Address, CC Number, +Descriptor\n"; foreach my $v (@a) { my ($m_id,@r) = @$v; $r[6] = ($r[6]) ? 'succeeded' : 'failed'; foreach my $tv (@r) { $tv = '' unless defined $tv; } push @r,$descriptors{$m_id}{$r[0]}; print join(',',@r),"\n"; } $dbh->disconnect;

This code works perfectly as long as the CSV file is less than approximately 152 kB! Alas, if the CSV file is greater than about 152 kB, the first 152 kB is sent, and the rest discarded, so the user sees a truncated file. NB: I used "-attachment=>'data.csv'" so that the client browser would have a name to assign to the file received, as the data is coming from a SQL query rather than a real file.

I tried adding "-Content_length=>$len," to the header section, and instead of printing the rows as I iterate through them, I put the content into a really long string, '$s', used length($s) to get a value for the content length, and again, it all works perfectly as long as the csv file size is less than about 152kB, but in this case, the file is never sent at all if the file size is greater than 152 kB.

Where is the 152 kB limit coming from, and how can I over-ride that limit, if that is possible? Or do I have to actually create the file somewhere in the document root dirctory tree, and issue a redirect or forward. If I have to resort to the latter, how do I ensure that the file is deleted once the client has it, or that no one else can ses the file?

I am almost at the point of either placing a limit on the amount of data that can be requested, or putting the data into a zip archive and downloading the archive, with the best available compression, instead of the csv file, but that smells like a bad kludge, especially since the user no longer has the option of just opening the csv file in his spreadsheet software once his browser has it.

Thanks

Ted

Comment on csv file download size limits?
Download Code
Re: csv file download size limits?
by talexb (Canon) on Nov 07, 2013 at 20:25 UTC

    From a quick look at your code, my first guess is that the Apache process is running out of memory, because of the way you're getting data from the database. The selectall-arrayref call is going to suck up lots of memory doing the data retrieval that way.

    A better method would be to get the data, and then print it, just one row at a time.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      Thanks Alex

      Alas, though plausible at first, it turns out that that wasn't it. I modified the script in question to get the data one row at a time, and the same limit is hit. The file actually received is precisely the same that I get when I use selectall_arrayref.

      In retrospect, if apache is able to use all free memory on the server, there should be no such limit. After all, there are several GB RAM available, and even a ridiculously large recordset from the DB would only be a few MB. We're talking a 64 bit Windows server here, with a 64 bit build of Apache's web server, and a 64 bit build of MySQL. If any of those server products have trouble with even a few MB of data, then there is a serious problem with it. And the perl we're using (v 5.16) is a 64 bit build, so there ought not be a problem there. But that constraint is so consistent, it must be a constraint imposed somewhere: but where do I look?

      Thanks again.

      Ted

        but where do I look?

        In the apache error log. If there's nothing there, start peppering your script with debugging statements until you track it down.

        Also look in the apache access log which will show the total size of the response body. It may be the case that the web server is sending the data fine, but something else in the network (or on the Windows box itself) is cutting it off.

Re: csv file download size limits (not perl)
by Anonymous Monk on Nov 08, 2013 at 00:27 UTC

    How bit is the file, how many bytes total?

    Why are you duplicating this array? my @a = @$a_ref;

    whatever limits you're hitting, they're made by your operating system + webserver

Re: csv file download size limits?
by Tux (Monsignor) on Nov 08, 2013 at 07:35 UTC

    Just a side note: you are generating both invalid and probably even unparseable CSV. In the header line, you are including spaces after the comma, causing the header fields to be seen as e.g.  Amount ("<space>Amount"). I cannot believe that to be intentional. Inside the loop you just spit out all fields joined on ,. Please try to think what will happen if any of the dat contains a comma. Having amounts makes that highly likely. The US uses $ 2,000,000.00 other countries might use 2.000.000,00. Try to think about your data consumer. Please use a CSV generator like Text::CSV_XS which makes your code not only correct, but also removes the need for extra undefined checks.

    print $cgi->header ( -type => "text/csv", -charset => "utf-8", -attachment => "data.csv", ); my $csv = Text::CSV_XS->new ({ binary => 1, eol => "\r\n" }); $csv->print (*STDOUT, [ "Platform ID", "Processor ID", "Transaction ID", "Date", "Settlement Date", "Transaction type", "Status", "Amount", "First Name", "Last Name", "Address1", "Address(cont.)", "City", "State or Province", "Postal Code", "Country", "Phone", "Email", "Processor response code", "MSC response code", "IP Address", "CC Number", "Descriptor" ]); foreach my $v (@a) { my ($m_id, @r) = @$v; $r[6] = $r[6] ? "succeeded" : "failed"; $csv->print (*STDOUT, [ @r, $descriptors{$m_id}{$r[0]} ]); } $dbh->disconnect;

    Enjoy, Have FUN! H.Merijn

      Thanks for this.

      Point noted. In this particular case, the data is not formatted. Rather, it is a straight dump of data from the DB. But, I will have to take a look at that, to ensure that the CSV file MySQL produces is valid. The amount field won't be a problem as it is an unformatted decimal number, but one of the address fields may well contain commas. IIRC, MySQL allows one to add a clause that encloses all fields in single or double quotes, which would prevent problems if one field or another includes commas as a part fo the data.

      Thanks

      Ted

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1061618]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2014-11-21 03:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (104 votes), past polls