Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
Think about Loose Coupling
 
PerlMonks  

Create minimal offline CPAN mirror

by Anonymous Monk
on Aug 06, 2002 at 22:07 UTC ( #188183=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a need for a minimal copy of the CPAN on CD or local disk.

Why? I am going on the road with no or an inadequate net connection and would like to have the latest modules available to me.

I have tried the script at: http://koschei.midnightrealm.org/code/perl/

Which is fine but it blows up too easily, LWP::Parallel::UserAgent overloads the target server - not good.

There was an discussion on this on p5p a while back.

Can any one suggest a sane way to do this, obviously some munging of 02packages.details.txt is required.

Comment on Create minimal offline CPAN mirror
Re: Create minimal offline CPAN mirror
by SuperCruncher (Pilgrim) on Aug 06, 2002 at 23:08 UTC
    You might like the check out wget. It is a "free software package for retrieving files using HTTP, HTTPS and FTP" and "has many features to make retrieving large files or mirroring entire web or FTP sites easy". Be sure to check out its manual. Even Win32 binary versions are available.

    AFAIK CPAN is regularly "trimmed" so that it can fit on a single CD-ROM, so just using wget should be fine. If it downloads too much you can always just delete what you don't need.

    Good luck.

      AFAIK CPAN is regularly "trimmed" so that it can fit on a single CD-ROM, ...
      Not unless you have a 1.3G CDROM:
      blue.stonehenge.com>> du -sk /data/mirror/CPAN/ 1255258 /data/mirror/CPAN/

      -- Randal L. Schwartz, Perl hacker

Re: Create minimal offline CPAN mirror
by bluto (Curate) on Aug 07, 2002 at 00:03 UTC
    The CPAN Faq talks about mirroring CPAN via rsync. Rsync can be very fast (after the first download at least), but I've never used it with CPAN so YMMV.

    See http://www.cpan.org/misc/cpan-faq.html#What_do_I_need

    bluto

      Hi,

      we're mirroring CPAN via rsync here too. As merlyn already pointed out, you need at least 1,3GB of free space NOW. I'd recomend at least 2.0GB as CPAN is growing and you need some overhead for logs etc.

      arkab:/space/scrap/mirrors # du -s cpan 1290032 cpan
      As for the time one needs for rsyncing: It's very fast. With a 2MBit line you need about 70minutes for the initial fetch. The updates are rsynced in about 5 minutes (or less) each.

      Bye
       PetaMem

Re: Create minimal offline CPAN mirror
by mattr (Curate) on Aug 07, 2002 at 07:05 UTC
    Caveat: Though I have used wget a lot, I have never mirrored CPAN

    1. You may need to find the package called mirror elsewhere; at least from my part of the net I can't see it on sunsite.

    2. Wget or rsync may be very good. But watch out, though I have no idea who did it, search.cpan.org once blackholed a huge network area due to some idiot who tried to snarf down every page from their engine and overloaded the site. They recently un-blackholed the network (which happens to be a giant Internet provider that also happens to serve an Internet cafe near my house in Tokyo), but felt it was a completely valid response. So watch out with that wget partner! Conversely, it may not matter but if wget hits a robots.txt exclusion file it ignores it for the top url only in a recursive download.

    3. I'm guessing that if you don't download obsolete versions it might really fit into a CD. Anybody?

    4.Possibly the programmer's interface to cpan (I suppose CPAN::Shell->expand("Module","/*/") or something similar might do it but not sure if you wouldn't miss some things. Conceivably CPAN's get command might just do it all for you, dunno.

    4. Note some modules apparently don't have version numbers.

•Re: Create minimal offline CPAN mirror
by merlyn (Sage) on Aug 08, 2002 at 01:23 UTC
    Can any one suggest a sane way to do this, obviously some munging of 02packages.details.txt is required.
    Based on that comment, I hacked out an 85-line program for my next LM column that does just that. It fetches everything you'd need to install any module in the modules list using CPAN.pm, and deletes anything old.

    The current "mini-CPAN" size is just a bit under 300M. Easily fit on a CDROM. Cool.

    The first time you run it, it sequentially fetches all of the modules, which is nicer on the source machine than doing them in parallel. After that, it fetches just what might have changed.

    -- Randal L. Schwartz, Perl hacker


    update: I posted this at Mirror only the installable parts of CPAN.
      Cool!!! Mad props to Merlyn.

      The ultimate present for the japh who has got it all.
      And still plenty of room to spare on the CD!

      I starting wondering how long before CPAN doubles to almost fill that CD.. but couldn't find a graph of CPAN growth. Can anybody post copies of du-k.gz with the date of download on it? A small sample and we can build a curve. I just downloaded one and called it du-k.2002-0808.gz (though I think the browser or somebody decompressed it before I saw it).

Re: Create minimal offline CPAN mirror
by Koschei (Monk) on Aug 08, 2002 at 03:59 UTC

    I'm the author of the one on midnightrealm. The reason it was written to be nasty and use Parallel is that the machine from which it was copying was on the local network. And it was for a one off burn, thus the lack of deletion features.

    Use merlyn's. He's seen mine and has made it much better.

    -- Iain, aka Koschei.

Re: Create minimal offline CPAN mirror
by elwarren (Curate) on Aug 09, 2002 at 16:34 UTC
    Has anyone done anything similar for ActiveState and ppm? It may come in handy when I have to reinstall Perl after installing Oracle9iAS...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://188183]
Front-paged by cLive ;-)
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2014-04-19 01:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (475 votes), past polls