Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Mirror only the installable parts of CPAN

by merlyn (Sage)
on Aug 08, 2002 at 06:10 UTC ( [id://188527] : sourcecode . print w/replies, xml ) Need Help??
Category: Utility Scripts
Author/Contact Info merlyn
Description: As noted in a parallel thread, I have this short program which can mirror a complete set of the installable modules for use with

This is for review purposes only. A final version of this code will appear in my LM column. Comments are welcome.
WARNING: As stated, this was a preliminary version of this program for comment only. While writing the column, I fixed a few bugs. Do not use the version here. Use the version there instead.

#!/usr/bin/perl -w
use strict;


# the CPAN url to fetch
my $REMOTE = "";
# my $REMOTE = "file://Users/merlyn/MIRROR/CPAN/";
# my $REMOTE = "";
# my $REMOTE = "";

# the path to the local mirror
# warning: unknown files below this dir are deleted!
my $LOCAL = "/Users/merlyn/Perl/MINICPAN/";

# how verbose?  false means nothing but errors
my $TRACE = 1;


## core:
use File::Path qw(mkpath);
use File::Basename qw(dirname);
use File::Spec::Functions qw(catfile);
use File::Find qw(find);

## LWP:
use URI ();
use LWP::Simple qw(mirror RC_OK RC_NOT_MODIFIED);

## Compress::Zlib
use Compress::Zlib qw(gzopen $gzerrno);

## first, get index files
my_mirror($_) for qw(

## now walk the packages list
my $gz = gzopen(catfile($LOCAL, "modules/02packages.details.txt.gz"), 
  or die "Cannot open details: $gzerrno";
my $state = 1;
while ($gz->gzreadline($_) > 0) {
  if ($state == 1) {        # in header
    $state = 2 unless /\S/;
  if ($state == 2) {        # blank following header
    $state = 3;

  my ($module, $version, $path) = split;

## finally, clean the files we didn't stick there

exit 0;

  my %mirrored;

  sub my_mirror {
    my $path = shift;

    my $remote_uri = URI->new_abs($path, $REMOTE)->as_string;
    my $local_file = catfile($LOCAL, $path);

    return if $mirrored{$local_file}++;

    ## presume "authors/id/*" is up to date if it is present
    return if $path =~ m{^authors/id} and -f $local_file;

    print "$remote_uri -> $local_file\n" if $TRACE;

    mkpath(dirname($local_file), 1, 0711);
    my $status = mirror($remote_uri, $local_file);

    return if $status == RC_OK or $status == RC_NOT_MODIFIED;
    warn "$remote_uri: $status!\n";

  sub clean_unmirrored {
    find sub {
      return unless -f and not $mirrored{$File::Find::name};
      print "removing $File::Find::name\n" if $TRACE;
      unlink $_ or warn "Cannot remove $File::Find::name: $!";
    }, $LOCAL;
Replies are listed 'Best First'.
Re: Mirror only the installable parts of CPAN
by zentara (Archbishop) on Aug 08, 2002 at 15:57 UTC
    I would find it useful if you could select which files you wanted
    to mirror. Sort of a selected download list. Maybe if I edit the
    02packages.details.txt.gz list to only contain the modules I want.
    It would be a 1-stop download for the latest versions of all
    modules I use.
      If you only want to upgrade the modules you use, then you don't want a CPAN mirror. Just go to and enter "r" to see what modules are out of date.

      Of course, you could add a regex match in the loop that mirrors individual modules if you wanted. Beware that this will then delete the other modules you may have downloaded to that tree before.

      -- Randal L. Schwartz, Perl hacker