I just performed a similar excercise for my employer. we were receiving names in this format last <suffix> first <middle> <> denotes optional
In my experience it required more than one regex, and a good knowledge of your data. for example your john m. doe example would be done as follows
use strict;
use warnings;
my @names = ('John-Boy M. Doe',
'John Doe',
'John St. Doe',
'John St Doe',
'John M. St. Doe',
'John M. O Doe',
'John M. O\'Doe');
foreach my $name (@names)
{
NAME_TEST:
{
$name =~ m{^([\w\-\']+)\s*(\w*\.*)\s((?:O|St)(?:\'|\.)*\s*[\w\
+-\']+)} && do{print 'First/Middle/(Prefix)Last';
+ print $1.'-'.$2.'-'.$3."\n\n";
+ last NAME_TEST;};
$name =~ m{^([\w\-\']+)\s*(\w*\.*)\s((?:O|St)(?:\'|\.)*\s*[\w\
+-\']+)} && do{print 'First/Middle/(Prefix)Last';
+ print $1.'-'.$2.'-'.$3."\n\n";
+ last NAME_TEST;}
}
}
As you can see I included some other examples of things you'll have to deal with. Running these regexen in a certain order is important. If you take my example and swap them it gives incorrect results, because the 2nd one finds what it thinks is a first middle last (because it doesn't know about the predef'd prefixes and such) Hope some of that helped