As noone has actually answered the question you posed, I'll give it a go.
Let's assume that the from directory contains
1000000001x[1-10].las
1000000002x[1-10].las
1000000003x[1-10].las
...
1000000010x[1-10].las
and the to directory already contain say 5 of each of those same 10 base filenames.
You read all the names into your 2 arrays (@files1 & @files2), giving you 100 in the first array and 50 in the second.
You then process each of the 100 files in the first array, against all 50 files in the second, substring the last 11 chars of each and counting the number of files in the second array that match the same basename in order to determine the next available number. You then rename the file and add then new name to the old.
So, in the above example, you are doing 3 substr's, a fairly complex concatenation and an increment for all 50 files in the second array, for the first file in the first array. On top of this, the inner array processing is embedded in the while loop which means that it will re-process the inner for loop several times in a manner I can't quite determine. When it comes to the second file in the outer for, their are now 51 files in the inner for (+plus more chance that the while will repeat the process of the inner for).
Even ignoring the while loop, the inner for and all the code it contains is going to repeat 50+51+52+53 ... 146+147+148+149 times.
Which is 50 * 199 or nearly 10, 000 iterations of the 3 substrs, incr & concatenation to copy 100 files at least!
I can't wrap my brain around the multiplier effect that the while loop is having, but I think it is substantial.
In the second version, you process the same 100 files in the from directory, but you only do 1 substr, 1 incr and the concatenation and call your exist sub and its code as many times as the file clashes with an existing one. (5+6+7+8+9+10+11+12+13+14) * 10 basenames.
Which is 5*19*10 = 950 iterations.
Admittedly you have moved some of the comparison work into the OS by using -e, but it is using optimised, compiled C to do the work rather than interpreted Perl.
Quicker still would be to sort the from array, and then process each set of base filenames consecutively. That way, you only need determine the next available postscript sequence number once, and then just increment it for the rest of the set. You may also be able to save some more time by globing each base filename against the target directory in turn, sorting it and extracting the sequence number from highest found.
The code might look something like this.
my @from = sort { substr( $a, 46, -4) <=> substr( $b, 46, -4) } <$from
+dir/*.las>;
my $oldbase = '';
my $next = 0;
for my $from (@from) {
my $base = substr( $from, 35, 11 );
if ( $base ne $oldbase ) {
$oldbase = $base;
my $last = ( sort{ substr( $a, 22, -4) <=> substr( $b, 22, -4)
+ } <$todir/$base*.las> ) [-1];
$next = substr( $next, 22, -4) + 1;
}
rename $from, "$todir/$basex$next.las";
$next++;
}
Note: This is untested code and will need work before it would compile.
Depending on the numbers of files and numbers of base filenames involved, The cost of sorting (which could probably be done more efficiently than I have shown here) should be more than offset by only needing to go to the OS for the target files once for each basename with a simple incr for each subsequent name in the set.
Examine what is said, not who speaks.
The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead. |