Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

mod_perl web app design considerations

by vladb (Vicar)
on Sep 03, 2002 at 18:54 UTC ( #194856=perlmeditation: print w/ replies, xml ) Need Help??

Recently I've been toying with Apache::Session a bit more as part of a new web application that I'm building (MVCC framework implemented in PageKit). One feature of this application is to allow its users to upload multiple images. However, I have to store those images in the database (mysql) rather than conventional way on the hard-drive (hack, either way things end up on HD, but you know what I mean eh? :). This is one of the key requirements of this web app. Storing images in the database provides for greater control as images belonging to one user may not be viewed by anyone else but the same user.

On one particular form I'm allowing a logged in user to upload up to 5 jpeg images + additional information for each. There's also a bunch of extra fields on the form. One decision that I have to make is where do I store the image data until all of the form is completed? How do I manage temporary form data (large images) and also make sure that the system is not overloaded with too much of that temporary data? Apache::Session is good in a way that sessions may expire, and when they do all data associated with a session is eliminated thereby freeing up resources and etc.

There are a few alternative routes which I'd looked into:
  1. Keep a record of the image in session. This record will only hold image description and path to the temporary file on the server. The image and it's content won't be committed to the database until after the entire form is completed (not just the image portion). Displaying the (small version of) image on the (yet-to-be-completed) form will then be a trivial matter of pointing to the temporary image url.

    As soon as the form is submitted the temporary image files will be removed from the server. However, if the user simply shuts his/her browser, there is no way to immediately remove the images. One option is I could write a Perl script that would run from cron and clean the temporary image directory of files that are too old.

    Yet, considering my application security requirements, with this option there's a lot of chance of making these images available to the internet public. So, this being highly undesirable I've opted not to go ahead with this approach.

  2. Keep image description and it's content in session. Write a script to display the image from session on the form. This script will basically have to spit out 'image/jpeg' content from the session. The clear advantage is that only the current signed in user is able to access his images. The draw back is that since I've configured my session data to be stored in a database 'sessions' table (standard) adding new images will further load my server (and is also slower than the disk file option in the first approach).

    Once the form is submitted, image data is saved into the database and removed from session space. On browser close, however, data will remain in session until it expires (?). As is the case with the first approach, I'd have to provision for a way of cleaing the sessions table off expired session keys?

  3. Keep image description and content in a global hash (remember this is mod_perl) under user session key. One advantage here is that adding new images and retrieving image content is pretty quick and doens't require database resources. Also, as with the previous approach, only the currently signed in user is able to access his/her images.

    Concering performance issues, if the global data grows large (users keep shutting their browsers after having uploaded 5+ images :) I may have to restart the server. Restarting the server too often doesn't smell 'the right way' either ;).

    But even with this option, there's a way to clean the global hash data. For example, I could add additional 'handler' (a method in the model class of the MVCC framework) that would allow a web administrator to view all session data and clean it up as needed (say, clean all expired sessions only).
So, which out of the above options you feel will work best for me? Or from your own experience, what was the approach that you took and how did it all play out? At this stage, I'm hanging between the second and third options. Each look appealing to me, making it all the harder to decide which is the right one to pursue.

Please pardon me for keeping this post so long. I've tried to be as clear as possible (which I doubt I've achieved :). I've also decided to make it a 'meditational' post due to the fact that the questions I'm asking here require moderate discussion and pondering. Frankly, the more people I could get input from the better ;-)

_____________________
# Under Construction

Comment on mod_perl web app design considerations
Download Code
Re: mod_perl web app design considerations
by valdez (Monsignor) on Sep 03, 2002 at 20:11 UTC

    Interesting meditation!

    I have another solution. Keeping images, maybe large ones, in memory or in a database forces you to use mod_perl to deliver them: you have a long print that can be instead handled by apache itself. You are replicating content generation and embedding security inside this phase of apache's life cycle.

    What you need instead is authentication, authorization and access control. Following this route you need 'only' to create directories with access rights embedded in their names. A dedicated access control can give authorization to display some content to the real apache, gaining in speed and modularity.

    If you need to share images between many servers I think NFS fyle system is a better option. Some discussions about this option can be found on mod_perl mailing list.

    Hope this helps. Ciao, Valerio

      thanks for your reply, valdez! :)

      You go on to say...

      What you need instead is authentication, authorization and access control.

      But for this to work, wouldn't I have to implement my own Apache module to intercept requests and do authentiation and authorization based on the value of the requested URI? At this stage, I've already written moderate amount of code (due to tight deadlines rather than hard reasoning :) for the www.pagekit.org MVCC framework. The actual framework is very sound and I've come to appreciate both it's simplicity and power. It is also easy to write handles to serve pretty much any content. I also had a past experience serving images from the database.

      However, what you are suggesting sounds very enticing. I will appreciate it if you send me links to some resources on the web where I can further delve into this subject. ;-)

      _____________________
      # Under Construction

        Here I am :)

        Chapter 6 from Eagle Book describes what you need:

        In this chapter, we step back to an earlier phase of the HTTP transaction, one in which Apache attempts to determine the identity of the person at the other end of the connection, and whether he or she is authorized to access the resource. Apache's APIs for authentication and authorization are straightforward yet powerful. You can implement simple password-based checking in just a few lines of code. With somewhat more effort, you can implement more sophisticated authentication systems, such as ones based on hardware tokens.

        You can find a copy of this chapter here. mod_perl Developer's Cookbook provides some other examples on the same subject.

        I understand your point about deadlines, I was talking about theory, real life is another story ;-)

        Good luck for your project. Ciao, Valerio

        But for this to work, wouldn't I have to implement my own Apache module to intercept requests and do authentiation and authorization based on the value of the requested URI?

        Rather happily, that's an awful lot easier than it sounds. A skeleton authentication handler looks like this:

        package Apache::AuthAny; # file: Apache/AuthAny.pm use strict; use Apache::Constants qw(:common); sub handler { my $r = shift; my($res, $sent_pw) = $r->get_basic_auth_pw; return $res if $res != OK; my $user = $r->connection->user; unless($user and $sent_pw) { $r->note_basic_auth_failure; $r->log_reason("Both a username and password must be provided +", $r->filename); return AUTH_REQUIRED; } return OK; } 1;
        (that'll authenticate on the *presence* of both a username and password, via HTTP Basic Auth - obviously you'd want to substitute a real-world authentication scheme).

        The Eagle book gives full details, and some of it seems to be online here:
        http://modperl.com:9000/book/chapters/ch6.html
        (found through random Googling).

        hth, andye.

Re: mod_perl web app design considerations
by perrin (Chancellor) on Sep 03, 2002 at 21:29 UTC
    First, I don't see why you have to put the images in the database. Keep the metadata in the database (path, name, who it belongs to) and keep the data in a normal file. Putting large binary files in a database almost always leads to trouble later on, and makes it impossible to do simple backups, moves, etc.

    About your option 3: how are you imagining you would implement this global hash? I think you're forgetting that each apache child process has separate globals with no sharing between them. You would have to share using disk or shared memory (with a module like IPC::MM).

    Option 2 will use up a lot of memory quickly. When you load a large image into memory in an apache process (which is what you would be doing here), that process will never shrink back down. The memory can be reused by that process, but it won't be given back to the general pool of free memory. That means that one user sending multiple requests over the course of a session with a 500k image can use up MBs of memory on your server.

    Option 1 sounds best. What's the security concern? You wouldn't be using your htdocs directory as temp space, would you? I don't see how anyone would see these images without your intention.

Re: mod_perl web app design considerations
by abell (Chaplain) on Sep 04, 2002 at 09:37 UTC
    I would opt for solution 4 :-)

    4. Put images into the database together with "confirmed" images, with a flag set to 'uncofirmed'. If the images belong to a more complex structure which is being built through a sequence of forms, put the 'unconfirmed' flag (a boolean field) on this structure, and set it to 'confirmed' when the input process ends properly.
    This way, images remain private and you can use the same routines you apply to regularly stored ("confirmed") images. You only need this extra boolean field in one database table and you need to check it when you perform queries on confirmed images (so add ' AND confirmed=TRUE' to all WHERE clauses in your queries on images).
    Every now and then you can delete all unconfirmed images, based on their upload date (if you have it stored somewhere), on their ID or simply when no user is in session .
    The drawback is you have to pay a little overhead for retrieving images from the DB at the following stages of the input process, but maybe this is what you already do when showing images to users.

    Best regards

    Antonio Bellezza

    Update: minor language corrections

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://194856]
Approved by VSarkiss
Front-paged by krisahoch
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (9)
As of 2014-12-26 19:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (174 votes), past polls