http://www.perlmonks.org?node_id=1060917

gurpreetsingh13 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, Need your wisdom and experience in my problem.

Reading Randal's articles on minicpan, I decided to try the same. But due to some proxy issues on my office machines, I couldn't directly use the given modules.

So I created one of my own with following steps:

1. Download all three files and save those on required directories.

2. Read packages.details file line by line.

3. Check if package tar file exists at specified location. If not then download the same.

4. Finally, remove any of the old files which are not a part of packages.

Code I have posted below. What my problem is that the size of cpan mirror directory is approaching nearly 2.5 GB. I need to know whether the current size is correct or there is some problem with my code that I am downloading multiple files or something like that, because as mentioned in Randal's article of year 2002 - the main purpose of minicpan is to burn all that into a single CD or some portable device. Please help me in that.

P.S. - I am using cygwin on a windows machine.

use strict; use warnings; use utf8; use bigint; use Array::Utils qw(:all); my $cpanPath = "/cygdrive/d/Softwares/cpanmirror"; my $remoteMirror = "http://mirrors.neusoft.edu.cn/cpan"; ##Get package file `cd /home/Gurpreet && rm -rf 01mailrc.txt*`; `cd /home/Gurpreet && rm -rf 02packages.details.txt*`; `cd /home/Gurpreet && rm -rf 03modlist.data*`; print "Deleted old package files\n"; print "=" x 30, "\n"; `cd /home/Gurpreet/ && wget $remoteMirror/authors/01mailrc.txt.gz`; `cp -f /home/Gurpreet/01mailrc.txt.gz $cpanPath/authors/`; `cd /home/Gurpreet/ && wget $remoteMirror/modules/02packages.details.t +xt.gz`; `cp -f /home/Gurpreet/02packages.details.txt.gz $cpanPath/modules/`; `cd /home/Gurpreet/ && wget $remoteMirror/modules/03modlist.data.gz`; `cp -f /home/Gurpreet/03modlist.data.gz $cpanPath/modules/`; print "Updated package files \n"; print "=" x 30, "\n"; #`cd /home/Gurpreet && gunzip 02packages.details.txt.gz`; print "Extracted package file \n"; print "=" x 30, "\n"; #Get total files excluding top lines in package file my $totalFiles = `cat 02packages.details.txt|wc -l`; chomp($totalFiles); $totalFiles = $totalFiles - 9; print "Total files = $totalFiles\n"; print "=" x 30, "\n"; #Get all packages names my @packageNames = `cat 02packages.details.txt|tail -$totalFiles|rev|cut -d " " -f1|r +ev|sort|un + iq`; chomp($_) foreach (@packageNames); print "Total unique packages = ", scalar(@packageNames), "\n"; print "=" x 30, "\n"; #Start with numbers print "Enter starting point of download\n"; chomp( my $startPoint = <STDIN> ); print "Enter ending point of download\n"; chomp( my $endPoint = <STDIN> ); #Start getting package files print "Starting update of cpanmirror. Press enter\n"; print "=" x 30, "\n"; <STDIN>; my $ctr = $startPoint; foreach my $val ( @packageNames[ $startPoint .. $endPoint ] ) { my @vals = split /\//, $val; my $packageName = $vals[ scalar(@vals) - 1 ]; my $dirName = join "/", @vals[ 0 .. scalar(@vals) - 2 ]; print "$ctr)Dir=$dirName Package=$packageName\n"; `mkdir -p $cpanPath/authors/id/$dirName` unless -d $dirName; `cd $cpanPath/authors/id/$dirName && wget $remoteMirror/authors/i +d/$dirName + /$packageName` unless -e "$cpanPath/authors/id/$dirName/$packageName"; $ctr++; } #Now get the list of old files and delete them print "Print Y to delete all extra files\n"; chomp( my $todo = <STDIN> ); my @allFileNames = `cd $cpanPath/authors/id && find -type f|grep -v CH +ECKSUMS`; foreach my $existingFile (@allFileNames) { chomp($existingFile); my $exactName = substr $existingFile, 2; unless ( $exactName ~~ @packageNames ) { print "Deleting $cpanPath/authors/id/$exactName\n"; ` rm -rf $cpanPath/authors/id/$exactName` if $todo eq "Y" || $ +todo eq "y + "; } }