Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Legacy Data Files on a Case-sensitive File-system

by pobocks (Chaplain)
on Jan 20, 2010 at 23:53 UTC ( #818585=perlquestion: print w/replies, xml ) Need Help??
pobocks has asked for the wisdom of the Perl Monks concerning the following question:

I'm currently in the process of creating a perl/TK frontend for read-only access to some data. The contents don't matter for this question (but are from a DOS star-mapping program for the Traveller RPG, called Galactic).

My problem is this: the files in question were originally created on a case-insensitive (but case preserving) filesystem, and were done so by hand. This has resulted in wildly differing casing between the files' names and the references to filenames in the files themselves. For example, ARAMIS.DAT might be referred to in one place as aramis.DAT, in another as Aramis.dat, etc.

I have two options - the easy, shortsighted solution would be, of course, to upcase all the filenames on disk, then upcase any references as used. The problem with this is that I'm not sure that I have the right to redistribute the data-files, and if the (eventual) users have to download the files themselves, they won't have the uppercase versions.

I'd very much like some advice on how to best go about the harder route - causing the program to treat the data case-insensitively on Unix (and other case-sensitive) file systems. Here's a naive attempt:

#!/usr/bin/perl -w + + use strict; my $name = 'aramis.DAT'; opendir $dh, '/directory'; my @dirs = readdir($dh); closedir $dh; for my $dir (@dirs){ if (/$names/i){ open (my $fh, '<', "/directory/$dir"); . . . close $fh; } }

Is this remotely pointed in the right direction?

for(split(" ","tsuJ rehtonA lreP rekcaH")){print reverse . " "}print "\b.\n";

Replies are listed 'Best First'.
Re: Legacy Data Files on a Case-sensitive File-system
by almut (Canon) on Jan 21, 2010 at 00:30 UTC

    File::Glob has a :nocase option:

    use File::Glob qw(:nocase); my @files = <[f]oobar>;

    would return whatever is found in the filesystem, i.e. any of "foobar", "Foobar", FooBar", etc.

Re: Legacy Data Files on a Case-sensitive File-system
by jethro (Monsignor) on Jan 21, 2010 at 00:27 UTC
    If you have to access just a few files, your method is probably already good enough.

    If not, I can think of two methods to speed it up, both allow you to use all-lowercase filenames in your program:

    1) Whenever your program starts, it checks whether there are still mixed-case filenames around (just check one file) and if yes, changes them all to lowercase (or uppercase if you prefer that). This means that when someone starts your program the first time with the downloaded data files, your program needs a little longer to start up. The next time all the files are already lowercase.

    2) Generate a hash at startup that has the lowercased filename as key and the (real) mixed case filename as value. To access a file, look it up in the hash. The hash has to be generated whenever someone starts the program, but it shouldn't be a noticable delay because the program has to make just one sweep of the files on disk.

    PS: Both algorithms need to do a sweep of the data files on disk. You might use File::Find to do this.

Re: Legacy Data Files on a Case-sensitive File-system
by jwkrahn (Monsignor) on Jan 21, 2010 at 01:16 UTC
    my $name = 'aramis.DAT'; ... if (/$names/i){

    I assume that is supposed to be $name in both places?    The period in the file name is a regular expression meta-character which will match any character so you have to escape it to match a literal period character.    This is usually done by using the quotemeta operator.    The regular expression, as it is, will match any file name of any length that contains the string 'aramis.DAT' so you should anchor the pattern so it just matches a file name of the same length as the pattern.    In other words:

    if (/$names/i){

    Should be:

    if (/\A\Q$name\E\z/i){

    Or you could compare file names without using regular expressions:

    my $name = 'aramis.dat'; ... if (lc eq $name){
Re: Legacy Data Files on a Case-sensitive File-system
by molecules (Monk) on Jan 21, 2010 at 00:20 UTC
    Sounds like you have the right idea. Is this a little closer to what you want?
    use strict; my $name = 'aramis.DAT'; opendir $dh, '/directory'; my @files = readdir($dh); closedir $dh; for my $file (@files){ if ($file =~ m/$names/i){ open (my $fh, '<', "/directory/$file"); . . . close $fh; } }
Re: Legacy Data Files on a Case-sensitive File-system
by JavaFan (Canon) on Jan 21, 2010 at 00:40 UTC
    Non-programming solution: store the files on a case-insensitive filesystem, and let the filesystem sort it out.

      I originally didn't get this, until I realized that you meant a non-case-preserving case-insensitive filesystem, followed by a copy back to real-people filesystems. In which case, yes, that would be the easiest way to solve my problem (and is very clever).

      Sadly, since the data are available on the web in their current, mis-cased form, dealing with it is the only option I have. Also, on another level, this wouldn't solve the problem of inconsistent filename casing IN the files - even after the files are all one case or the other, I'd still need to have a programming solution to match all the internal references.

      for(split(" ","tsuJ rehtonA lreP rekcaH")){print reverse . " "}print "\b.\n";
Re: Legacy Data Files on a Case-sensitive File-system
by tokpela (Chaplain) on Jan 21, 2010 at 11:14 UTC

    Maybe a module like Win32::StrictFileNames might help? I have not used this module and just heard about it. Supposedly, it enables case-sensitive checking on Windows systems.

    In looking at the documentation, this module seems to be a proof of concept. The main example given is check for a used module name. But on this page there is a hint at other file checking uses.

    At the least, you could check under the hood and see what is being done which might help you solve your problem.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://818585]
Approved by planetscape
Front-paged by almut
and the voices are still...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (8)
As of 2017-02-21 10:06 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (309 votes). Check out past polls.