A year or so ago, I finally got fed up with selecting CPAN mirrors on my couple dozen hosts. I'm installing the same 10 or 50 modules on all these hosts over and over and it seems silly both to pull down the same modules over and over and to have to go through cutting and pasting the new url to each host every time. Wouldn't it be easier, I thought, if I could just set it to the the URL to my own mirror and use that?

But I don't install modules that often and don't want to share the mirror, so it seemed rude to create an actual rsync mirror just for my tiny hosting situation. My solution was a roll-your-own Caching CGI CPAN proxy system.

#!/usr/bin/perl use strict; use CGI; use CGI::Carp qw(fatalsToBrowser); use Cache::File; use Data::Dumper; use LWP::UserAgent; my $cache_location = "/home/somewhere/writeable/by/httpd/"; my @mirrors = ( # NOTE: I'd use more than one here, but have found that # causes problems "http://yourfavorite/mirror/", ); my $mirror = $mirrors[ rand @mirrors ]; my $cgi = new CGI; my $pinfo = $ENV{PATH_INFO}; $pinfo =~ s/^\///; my $CK = "PPC:$pinfo"; my $again = 0; THE_TOP: # we regen the cache each time just in case things aren't flu +shed correctly... # probably don't need to though my $cache = Cache::File->new(cache_root=>$cache_location, default_expi +res => '2 day' ); if( $cache->exists($CK) and $cache->exists("$CK.hdr") ) { our $VAR1; my $res = eval $cache->get( "$CK.hdr" ); die "problem finding cach +e entry\n" if $@; my $status = $res->status_line; print $cgi->header(-status=>$status, -type=>$res->header( 'content +-type' )); my $fh = $cache->handle( $CK, "<" ) or die "problem finding cache + entry\n"; if( $res->is_success ) { my $buf; while( read $fh, $buf, 4096 ) { print $buf; } } else { print $status; } close $fh; unless( $res->is_success ) { $cache->remove($CK); } exit 0; } elsif( not $again ) { $again = 1; my $ua = new LWP::UserAgent; $ua->agent("PPC/0.1 (paul's proxy cache perlmonks-id=16186)"); $cache->set($CK, 1); # doesn't seem like we should ahve to do this +, but apparently we do my $fh = $cache->handle( $CK, ">" ); my $request = HTTP::Request->new(GET => "$mirror/$pinfo"); my $response = $ua->request($request, sub { my $chunk = shift; pri +nt $fh $chunk }); close $fh; $cache->set("$CK.hdr", Dumper($response)); goto THE_TOP; } die "problem fetching $pinfo. :(\n";

I'd post it directly to the code catacombs or CPAN or something, but I'm not particularly proud of it. I think it'd be useful to others, and it certainly functions, but I'm seeking feedback on it. I'm hopeful that there'll be some insightful comments on it. (Are my module choices sane? Did someone else do this better and I just haven't found it?)

UPDATE: ... or, if there's nothing at all to say about it, then this is a fine resting place for it.

UPDATE: Actually, the final resting place is now here: CPAN::CachingProxy


Replies are listed 'Best First'.
Re: My CPAN Proxy Mirror
by hossman (Prior) on May 06, 2008 at 00:08 UTC

    I'm a little confused ... did you actaully *want* a local mirror of the stuff you're installing, or did you just want to make you life easier when picking your CPAN mirror during CPAN.pm configuration?

    If the later, then you don't seem to have made life any easier, since your mirror can only deal with one URL (for reasons that aren't clear). Picking one URL when configuring CPAN isn't really that hard.

    You probably could have achieved the same thing (without needing any local disk, actaully using multiple mirrors) by configuring your DNS server to make "cpan-mirror.yourdomain.com" round robin through the mirrors you want to use.

      On one machine, no, it isn't hard to change it. Making the same change on 15 or 20 hosts starts to get irritating. Not difficult, just irritating. Which mirror am I using now again? Oh, right... Then 'o conf urllist blah blah..." I tend to only change when my selected mirror gets slow, stale, or disappears.

      It's while I was at it that I wanted to start mirroring CPAN (for speed as much as keeping the load off the mirror) to avoid downloading all the same modules over and over, but I didn't want to mirror the whole thing or even the mini-list. It takes up too much space it's it's even more load for the mirror, rather than less. What I really wanted was a cache.

      The reason I only pick one mirror at a time is that they don't always seem to be synced up with the same modules. I originally set it up to be able to pick randomly from a list of favorites, but it causes unexpected problems -- so I started just picking one.


        Hmmm... I'd probably just use something like squid configured as an accelerator cache.

        You'd get all the same benefits (local caching of only the stuff you've installed and a single place to configure the remote site you are mirroring) but you wouldn't have to write anything, and it would manage the disk usage.

Re: My CPAN Proxy Mirror
by xdg (Monsignor) on May 06, 2008 at 09:56 UTC

    That's an interesting approach.

    Personally, I just use CPAN::Mini (well, actually CPAN::Mini::Devel that includes latest development versions, too) and serve that up on a local webserver on the network. While there is a big one-time hit to synchronize the first time, the ongoing hit on a mirror is fairly minimal (most distributions are pretty small). At this point, disk is cheap, so I figure why not have it all?

    If I wanted to only mirror a specific list of modules, I would use the module_filters parameter (which ordinarily filters out a list of modules) and just invert the logic.

    module_filters => [ sub { $_[0] !~ /Foo::Bar|Baz::Bam|.../ } ]


    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.