tosh has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am going to be using the S3 service from Amazon and in order to keep some of my data xfer costs down I want to cache the most used files on my server.

I've been looking for some kind of vanilla cache management modules or methods but keep coming up with RAM cache or template cache or squirrel cache.

I'd love to not reinvent the wheel so if anybody knows of anything along these lines I would love to find out!!



In case anyone is looking for the same thing in the future here is the solution I have come up with using Cache::SizeAwareFileCache

::PUT FILE:: IF filesize > MAX_S3_SIZE SAVE file locally ELSE SAVE file in SizeAwareFileCache SAVE file to S3 ::GET FILE:: IF file in SizeAwareFileCache SEND file ELSE SEND user to get file from S3 GET file from S3 and put into SizeAwareFileCache

Replies are listed 'Best First'.
Re: File caching for external storage
by roboticus (Chancellor) on Nov 28, 2007 at 13:08 UTC

    I've never tried it, but perhaps you can configure squid to do the job for you? (I googled "caching proxy" to get a few hits without seeing so many RAM/squirrel/template cache pages.



Re: File caching for external storage
by KurtSchwind (Chaplain) on Nov 28, 2007 at 14:21 UTC

    I'll second the squid love.

    Install squid. Set all your clients to proxy to your squid server and run with it.

    I used to drive a Heisenbergmobile, but every time I looked at the speedometer, I got lost.
      Sounds like hes looking for something server side though and my not have control over client ( browser ) proxies.
      "That which we persist in doing becomes easier, not that the task itself has become easier, but that our ability to perform it has improved."
        --Ralph Waldo Emerson
        Exactly, server-side.

        Squid is interesting I think, but I don't just want to proxy connections, I will need to do some programming.

        One of the problems I have is that the PERL modules for S3 don't stream data, which means that if my application is sending a 2GB file to S3 then that 2GB file gets loaded into memory before it gets sent, that's bad. So I will store files over 500 MB locally and keep that info in a table.

        But while I'm doing tables I might as well cache the most used files locally on my server instead of sending users to pick them up from S3 and incurring the bandwidth charge, say cache up to 100GB (the size of the local drive) before rotating out of the cache, and then how to determine which files should and shouldn't be in the cache.

        This all seems to me to be something that somebody has to have done before, maybe not with S3, but just storing content and caching it. Maybe not...