Matching Accented Names

ropey
Hi guys, Had this issue which is somewhat foxing me... there are two stages to this issue.

Stage 1) I will take a input for a name from a user, I wish to apply a regex to validate that the name is what I would consider a name, so all chars A-Za-z, a space, a . etc... however I also need to accept accented characters like _... I even tried this in the regex but sometimes it fails... I think this is something to do with how the file is saved... my test scripts dont work properly... so is there a better way of doing this ?

Stage 2) I need to replace the accented chars with non accented equivalent (as the mainframe they eventually end up in do not accept them. I have used several regex's like

my $t = shift; $t =~ s/(|)/AE/g; # $t =~ s/(|)/OE/g; # $t =~ s/(|)/UE/g; # $t =~ s/()/SZ/g; # $t =~ s/(|||)/a/g; #||| $t =~ s/()/o/g; # $t =~ s/()/e/g; # $t =~ s/()/i/g;# $t =~ s/()/I/g;# $t =~ s/(|||)/A/g;#||| $t =~ s/(|)/O/g; $t =~ s/[^a-z0-9\,\.\s\/\-\@\:]//ig; return $t;

Any tips in solving this greatly appreciated.

    It appears that some characters are not within ISO Latin 1, so the problem might not be so easily solved. You may need to know which character set/locale they are coming from, there could be some overlap. You might like to take a look at perllocale.
    IBM mainframe has a different EBCDIC codeset for each European language, so it is not impossible to retain the correct characters, provided you know which charset they come from.

