http://www.perlmonks.org?node_id=533975

cupojoe has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

It's a small project, and I'm pretty green to Perl.

It's an architecture and reality check question. Hope you don't mind.

Building a section of a web site so logged in users can download files we've uploaded for them to get. The files are 10 - 20 megabytes (pdf'd engineering reports).

We can't count on them having ftp clients, and our web host (shared hosting) won't allow cgi driven downloads or uploads exceeding 2 megs. (I know, host sucks.)

The current plan is, we'll ftp the file up, and my tool will provide a link so they can right-click and "save as".

(linux, apache, mod_perl, cgi-application and session plugin, HTML-Template, dbi, mysql.)

Originally, I wanted to control the download so we had as much positive info as possible that it was downloaded, date/time, IP address, login id etc. Since it doesn't look like I can use cgi.pm to drive the download, or the upload, and they're likely to not have an ftp client or know how to use it - the right clicked link appears reasonable.

Unless I'm missing something.

What protocol is being used in the right-clicked "save as"? Ftp managed by the browser? Http?

Does this act appear in the web server log so that I can programmatically find and record the download data?

I've spent a good deal of time in various searches here at Perlmonks, MS knowledge base, and others, and feel I've run out of even knowing where else to look.

Thanks in advance!

  • Comment on File download tool, file size issues, cgi-application

Replies are listed 'Best First'.
Re: File download tool, file size issues, cgi-application
by ikegami (Patriarch) on Mar 02, 2006 at 16:55 UTC
    What protocol is being used in the right-clicked "save as"? Ftp managed by the browser? Http?

    Whichever protocol is specified in the link. If the link says http://..., HTTP will be used. If the link says ftp://..., FTP will be used. The only difference is between clicking on the link normally and using save-as is what the browser does with the downloaded file. The former displays it, while the latter saves it to disk.

    All major web browsers of the last eon should have both HTTP and FTP clients built in, so they can handle both HTTP and FTP links seemlessly.

    Does this act appear in the web server log so that I can programmatically find and record the download data?

    If HTTP is used, it'll appear in your web server logs.
    If FTP is used, it'll appear in your FTP server logs.

      Thanks for this. I think I was wondering if HTTP changed its behavior for larger files being downloaded and not displayed. If the browswer defaulted to ftp for the download, I wondered how that could affect me.

      Plus I'm looking to get some confidence that I'm approaching the application in a reasonable way. (link to the file rather than handling it directly via cgi.pm) It seems counterintuitive that the host will easily serve a huge download via http, but won't allow it via cgi (up or down - although I'm betting this is a security measure).

Re: File download tool, file size issues, cgi-application
by salvix (Pilgrim) on Mar 02, 2006 at 18:12 UTC
    If your host allows symbolic links in your htdocs (Options FollowSymLinks), you could have the files to be downloaded in another directory outside the htdocs and have a simple CGI script (or mod_perl handler) that:

    1- Receives as a parameter the filename to be downloaded

    2- Given the current authenticated user, creates a symbolic link of the real file to a file in "/htdocs/downloads/", where the symlink name would be something like filename_USERNAME_MD5GARBAGE.pdf. The MD5GARBAGE would be generated by

    my $md5garbage = md5($real_filename, $username, "any random string");
    3- Then your CGI redirects the browser (using the Location HTTP header) to the URL http://yoursite/downloads/filename_USERNAME_MD5GARBAGE.pdf.

    4- Later on, look into your access log and any successful hit to "filename_username_xxxx.pdf" means that that specific user downloaded that specific file.

    5- Periodically, clean up the old symlinks in the /htdocs/downloads/ directory.

    (Yeah, it looks like a dirty hack...)

      1- Receives as a parameter the filename to be downloaded
      ..and could also:
      • 1.a- Compress the file to download
      • 2.a- Know the speed of the connection and mention: bzipped file name, file size and estimated time to download it.
      • 3.a- And, perhaps, even mention the MD5 sum to check with after finishing the download if the user has a way of doing it at his desktop.
      Besides, as a starting point, just try creating an icon at your desktop that points to an ftp file at a well configured ftp site. See how easyly it could work. Then try adding the minimum required to fullfill all your needs.

      And don't forget one of the newest ways of doing such heavy tasks: bittorrent and .torrent files. Similar users get together to improve the download time and diminish the host work.

Re: File download tool, file size issues, cgi-application
by idsfa (Vicar) on Mar 02, 2006 at 17:55 UTC

    You could also wrap all of your statistic gathering into a small CGI that responds with a page containing instructions and a right-clickable link. This would allow you to have your cake and eat it too. As a bonus, you could allow the user to specify whether they want an http or ftp link in their request.

    PS - In general, you'll probably want to use http links. The ftp option would require that your service provider has set up anonymous FTP ... not likely from the level of service you describe.


    The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. — Cyrus H. Gordon
      Yeah we have anonymous ftp, but I just tried an http download with an 18 meg file and it was fine. The "small" cgi you mentioned is roughly what's happening except there's also an admin side so our guys can create accounts, set logins/pw's for downloaders, let's them assign the ftp'd files to the login, shows them brief reports about selected accts; users see their page with instructions, are informed this file and link will expire in 10 days (space savings). Auto email to our admin that it happened, configurable email to downloader thanking them and reminding them of the 10 day thing. etc.
Re: File download tool, file size issues, cgi-application
by rhesa (Vicar) on Mar 02, 2006 at 18:33 UTC
      This looks good, but the OP stated that he can't serve files bigger than 2MB through any CGI script:
      We can't count on them having ftp clients, and our web host (shared hosting) won't allow cgi driven downloads or uploads exceeding 2 megs. (I know, host sucks.)
        Right, I did gloss over that. in that case, there's no way to check that the download was successful at the time of download. The only option that remains is scanning the server logs, and checking if the downloaded size equals the file size. I'd probably give the download link a query param that makes it easy to pick it up from the logs.
Re: File download tool, file size issues, cgi-application
by cupojoe (Novice) on Mar 03, 2006 at 00:22 UTC

    Thanks to everyone for the help.