Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Minicpan size issue

by gurpreetsingh13 (Scribe)
on Nov 02, 2013 at 09:39 UTC ( #1060917=perlquestion: print w/ replies, xml ) Need Help??
gurpreetsingh13 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, Need your wisdom and experience in my problem.

Reading Randal's articles on minicpan, I decided to try the same. But due to some proxy issues on my office machines, I couldn't directly use the given modules.

So I created one of my own with following steps:

1. Download all three files and save those on required directories.

2. Read packages.details file line by line.

3. Check if package tar file exists at specified location. If not then download the same.

4. Finally, remove any of the old files which are not a part of packages.

Code I have posted below. What my problem is that the size of cpan mirror directory is approaching nearly 2.5 GB. I need to know whether the current size is correct or there is some problem with my code that I am downloading multiple files or something like that, because as mentioned in Randal's article of year 2002 - the main purpose of minicpan is to burn all that into a single CD or some portable device. Please help me in that.

P.S. - I am using cygwin on a windows machine.

use strict; use warnings; use utf8; use bigint; use Array::Utils qw(:all); my $cpanPath = "/cygdrive/d/Softwares/cpanmirror"; my $remoteMirror = "http://mirrors.neusoft.edu.cn/cpan"; ##Get package file `cd /home/Gurpreet && rm -rf 01mailrc.txt*`; `cd /home/Gurpreet && rm -rf 02packages.details.txt*`; `cd /home/Gurpreet && rm -rf 03modlist.data*`; print "Deleted old package files\n"; print "=" x 30, "\n"; `cd /home/Gurpreet/ && wget $remoteMirror/authors/01mailrc.txt.gz`; `cp -f /home/Gurpreet/01mailrc.txt.gz $cpanPath/authors/`; `cd /home/Gurpreet/ && wget $remoteMirror/modules/02packages.details.t +xt.gz`; `cp -f /home/Gurpreet/02packages.details.txt.gz $cpanPath/modules/`; `cd /home/Gurpreet/ && wget $remoteMirror/modules/03modlist.data.gz`; `cp -f /home/Gurpreet/03modlist.data.gz $cpanPath/modules/`; print "Updated package files \n"; print "=" x 30, "\n"; #`cd /home/Gurpreet && gunzip 02packages.details.txt.gz`; print "Extracted package file \n"; print "=" x 30, "\n"; #Get total files excluding top lines in package file my $totalFiles = `cat 02packages.details.txt|wc -l`; chomp($totalFiles); $totalFiles = $totalFiles - 9; print "Total files = $totalFiles\n"; print "=" x 30, "\n"; #Get all packages names my @packageNames = `cat 02packages.details.txt|tail -$totalFiles|rev|cut -d " " -f1|r +ev|sort|un + iq`; chomp($_) foreach (@packageNames); print "Total unique packages = ", scalar(@packageNames), "\n"; print "=" x 30, "\n"; #Start with numbers print "Enter starting point of download\n"; chomp( my $startPoint = <STDIN> ); print "Enter ending point of download\n"; chomp( my $endPoint = <STDIN> ); #Start getting package files print "Starting update of cpanmirror. Press enter\n"; print "=" x 30, "\n"; <STDIN>; my $ctr = $startPoint; foreach my $val ( @packageNames[ $startPoint .. $endPoint ] ) { my @vals = split /\//, $val; my $packageName = $vals[ scalar(@vals) - 1 ]; my $dirName = join "/", @vals[ 0 .. scalar(@vals) - 2 ]; print "$ctr)Dir=$dirName Package=$packageName\n"; `mkdir -p $cpanPath/authors/id/$dirName` unless -d $dirName; `cd $cpanPath/authors/id/$dirName && wget $remoteMirror/authors/i +d/$dirName + /$packageName` unless -e "$cpanPath/authors/id/$dirName/$packageName"; $ctr++; } #Now get the list of old files and delete them print "Print Y to delete all extra files\n"; chomp( my $todo = <STDIN> ); my @allFileNames = `cd $cpanPath/authors/id && find -type f|grep -v CH +ECKSUMS`; foreach my $existingFile (@allFileNames) { chomp($existingFile); my $exactName = substr $existingFile, 2; unless ( $exactName ~~ @packageNames ) { print "Deleting $cpanPath/authors/id/$exactName\n"; ` rm -rf $cpanPath/authors/id/$exactName` if $todo eq "Y" || $ +todo eq "y + "; } }

Comment on Minicpan size issue
Download Code
Re: Minicpan size issue (2gb)
by Anonymous Monk on Nov 02, 2013 at 10:25 UTC
Re: Minicpan size issue
by marto (Chancellor) on Nov 02, 2013 at 12:01 UTC

    "Randal's article of year 2002 - the main purpose of minicpan is to burn all that into a single CD or some portable device. Please help me in that."

    It isn't 2002 any more, this isn't going to happen. CPAN has grown. Rather than use your own code to do this I echo what AM has said, use CPAN::Mini.

    If you are concerned about size of your minicpan you can use the options/filters to exclude things you don't need, for example:

    • Language distributions (perl, parrot etc)
    • Entire module hierarchy (for example ACME::, Tk, modules for a specific platform which you'll never use)
    • Exclude modules from specific authors who you know pollute CPAN with garbage or modules that nobody but them will use

    Not only will this save disk space, but your updates will take less time and use less bandwidth.

    Update: Even without any of the above, a minicpan will easily fit on a standard single layer DVD, media is cheap enough these days. Also consider USB flash memory storage. I use the storage space on my phone.

Re: Minicpan size issue
by keszler (Priest) on Nov 02, 2013 at 16:49 UTC
    I had a similar need. My solution was to install portable Strawberry Perl on a USB drive, install CPAN::Mini into it, and run the following from the Strawberry portableshell command line:
    minicpan -p -l \CPANmini -r http://some.cpan.mirror/pub/CPAN/
    The USB drive has top-level directories /strawberry and /CPANmini - together they currently fit on a 4GB USB drive.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1060917]
Approved by Laurent_R
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (13)
As of 2014-09-16 15:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (33 votes), past polls