Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Dynamic file renaming and globbing

by licking9Volts (Pilgrim)
on Jan 13, 2003 at 19:57 UTC ( #226579=perlquestion: print w/replies, xml ) Need Help??
licking9Volts has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I recently completed a small project here at work that involved renaming files based on existing file names. All of the files are named like this: a 10 digit base number, 1 letter 'x', an optional number, then the extension '.las'.
1234567890x.las 1234567890x1.las 1234567890x2.las
Many of the files share the same base number so that's why the optional number after the letter is needed.
There are around 4,000 files already on a network drive. New files that are to be added, need to have their prospective names compared to the existing files and the optional number added or incremented to make sure nothing will be overwritten. The files are then moved to a different folder to verify their header information before being moved to the final directory. Here's the code I used originally, and although the initial file globbing is slow, the actual renaming is quite fast.
#!/usr/bin/perl -w use strict; my $dir1 = "h:/andrew/bruner/temp/stage/copies"; my $dir2 = "j:/templas"; my $field1 = " "; my $field2 = " "; my $testflag = '0'; my $newname = " "; my $oldname = " "; my $counter = '0'; print "\nGlobbing 1st set..."; my @files1 = glob("$dir1/*.las"); print "done\nGlobbing 2nd set..."; my @files2 = glob("$dir2/*.las"); print "done.\n"; foreach $field1 (@files1) { $testflag = '1'; $oldname = $field1; while ($testflag == '1') { foreach $field2 (@files2) { if (substr($field1, 35, 11) eq substr($field2, + 11, 11)) { $counter++; $field1 = substr($field1, 0, 46) . $co +unter . ".las"; print "Trying $field1\n"; } else { $testflag = '0'; } } } if ($counter >= '1') { $newname = substr($field1, 0, 46) . $counter . ".las"; } else { $newname = substr($field1, 0, 46) . ".las"; } push (@files2, $newname); rename $oldname, $newname; print "$oldname renamed to $newname"; $counter = 0; }
Just this morning, I found this post on Renaming Files by tachyon and it looked like it would work more efficiently than mine. I modified it to fit my situation and found that it DID work better than my original. Here's the new code I used:
#!/usr/bin/perl -w use strict; my $loc_dir = 'h:/andrew/bruner/temp/stage/'; my $to_dir = 'h:/andrew/bruner/temp/stage/copies/'; my $net_dir = 'j:/templas/'; my $from = '.las'; my $counter = '0'; my ($old, $new, $name); my @usedarray; while (<$loc_dir*$from>) { $old = $new = $_; $new =~ s/^$loc_dir/$net_dir/; while (exist($new)) { $counter++; $new = substr($new, 0, 22) . $counter . ".las"; } $counter = '0'; $new =~ s/^$net_dir/$to_dir/; rename $old, $new; print "\nRenamed $old $new"; } sub exist { my $file = shift; if (-e $file) { print "\nFile $file exists."; return 1; } $file =~ s/^$net_dir/$to_dir/; if (-e $file) { print "\nFile $file exists."; return 1; } }
This works exactly like I need it to. So many thanks to you, tachyon, for sharing your code. One thing that puzzles me though is why it takes so much longer for the file globbing in the original code
my @files2 = glob("$dir2/*.las");
than it does in the newer code
while (<$loc_dir*$from>) {
Can anyone explain this? Thanks in advance!


Replies are listed 'Best First'.
Re: Dynamic file renaming and globbing
by jdporter (Canon) on Jan 13, 2003 at 21:15 UTC
    I would use readdir to read the names of the files that need to be renamed.
    Then I'd only glob for those files that would potentially collide.
    my $existingfiles_dir; my $newfiles_dir; my $dest_dir; opendir XFD, $existingfiles_dir or die "read $existingfiles_dir : $! +"; opendir NFD, $newfiles_dir or die "read $newfiles_dir : $!"; for my $file ( readdir NFD ) { # parse my( $num, $ver, $ext ) = $file =~ /(\d+x)(\d*)(\..*)/; # find the last of the files it might collide with: my( $last ) = sort { $b <=> $a } # sort version numbers descending map { /^$num(.*)\Q$ext/ ? $1 : () } readdir XFD; rewinddir XFD; # take the next available number: $ver = $last ? $last + 1 : ''; rename $file, $num.$ver.$last; }

    The 6th Rule of Perl Club is -- There is no Rule #6.

Re: Dynamic file renaming and globbing
by BrowserUk (Pope) on Jan 14, 2003 at 06:01 UTC

    As noone has actually answered the question you posed, I'll give it a go.

    Let's assume that the from directory contains

    1000000001x[1-10].las 1000000002x[1-10].las 1000000003x[1-10].las ... 1000000010x[1-10].las

    and the to directory already contain say 5 of each of those same 10 base filenames.

    You read all the names into your 2 arrays (@files1 & @files2), giving you 100 in the first array and 50 in the second.

    You then process each of the 100 files in the first array, against all 50 files in the second, substring the last 11 chars of each and counting the number of files in the second array that match the same basename in order to determine the next available number. You then rename the file and add then new name to the old.

    So, in the above example, you are doing 3 substr's, a fairly complex concatenation and an increment for all 50 files in the second array, for the first file in the first array. On top of this, the inner array processing is embedded in the while loop which means that it will re-process the inner for loop several times in a manner I can't quite determine. When it comes to the second file in the outer for, their are now 51 files in the inner for (+plus more chance that the while will repeat the process of the inner for).

    Even ignoring the while loop, the inner for and all the code it contains is going to repeat 50+51+52+53 ... 146+147+148+149 times.

    Which is 50 * 199 or nearly 10, 000 iterations of the 3 substrs, incr & concatenation to copy 100 files at least!

    I can't wrap my brain around the multiplier effect that the while loop is having, but I think it is substantial.

    In the second version, you process the same 100 files in the from directory, but you only do 1 substr, 1 incr and the concatenation and call your exist sub and its code as many times as the file clashes with an existing one. (5+6+7+8+9+10+11+12+13+14) * 10 basenames.

    Which is 5*19*10 = 950 iterations.

    Admittedly you have moved some of the comparison work into the OS by using -e, but it is using optimised, compiled C to do the work rather than interpreted Perl.

    Quicker still would be to sort the from array, and then process each set of base filenames consecutively. That way, you only need determine the next available postscript sequence number once, and then just increment it for the rest of the set. You may also be able to save some more time by globing each base filename against the target directory in turn, sorting it and extracting the sequence number from highest found.

    The code might look something like this.

    my @from = sort { substr( $a, 46, -4) <=> substr( $b, 46, -4) } <$from +dir/*.las>; my $oldbase = ''; my $next = 0; for my $from (@from) { my $base = substr( $from, 35, 11 ); if ( $base ne $oldbase ) { $oldbase = $base; my $last = ( sort{ substr( $a, 22, -4) <=> substr( $b, 22, -4) + } <$todir/$base*.las> ) [-1]; $next = substr( $next, 22, -4) + 1; } rename $from, "$todir/$basex$next.las"; $next++; }

    Note: This is untested code and will need work before it would compile.

    Depending on the numbers of files and numbers of base filenames involved, The cost of sorting (which could probably be done more efficiently than I have shown here) should be more than offset by only needing to go to the OS for the target files once for each basename with a simple incr for each subsequent name in the set.

    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Re: Dynamic file renaming and globbing
by waswas-fng (Curate) on Jan 14, 2003 at 03:26 UTC
    I have a suggestion here from your code you either have a race situation or you can speed this up. The race comes from the distance of the actual test of the file exist and the creation of the new file with the new name. If your app is not the only one that may place files in the target dir you may run into a race where you check to see if a filename exists and it does not then someone creates the file and then you try to rename. BAD. Or if your app _IS_ the only one that will be writing to the target directory, why not push all of the files into an hash on startup? you would only need to build the hash once and then you can do _very_ fast lookups and push an entry into the hash when you rename a file.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://226579]
Approved by Mr. Muskrat
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2017-12-18 10:58 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (481 votes). Check out past polls.