http://www.perlmonks.org?node_id=1021611


in reply to Re^3: RFC: I rewrote a custom sort function, but not sure if it is the best way.
in thread RFC: I rewrote a custom sort function, but not sure if it is the best way.

roboticus, I decided to give the name transform a try, and so far it looks pretty good.

#!/usr/bin/perl use strict; use warnings; my @roman_numerals = qw(I II III IV V VI VII VIII IX X); my $roman_numerals_string = join('|',@roman_numerals); sub name_transform { my $name = shift; # If a suffix is so long it needs a comman, let's get it here. my ($base_name,$pre_suffix) = split(', ',$name); # Split the rest of the name by the spaces. my @name = split(' ',$base_name); # Now check the first array item to see if it is a common prefix. # Some are there for fun. my $prefix = $name[0] =~ /(?:Lady|Lord|[MD][rs]|Mrs|Miss|Pres|Gov|Se +n|officer)(?:|\.)/ ? shift @name : ''; # Now check the last item of the array to see if it matches some com +mon suffixes. # More Roman numerals can be aded. my $suffix = $pre_suffix ? $pre_suffix : $name[-1] =~ /(?:Jr|Sr|Esq| +$roman_numerals_string)(?:|\.)/ ? pop @name : ''; # All which should be left is the bare name. Even if only the first +name is left, # it will be treated as the last name and maybe sorted accordingly. my $last_name = pop @name; my $first_name = shift @name // ''; # Every name left should be middle names. my $middle_name = @name ? join(' ',@name) : ''; return [$last_name,$first_name,$middle_name,$prefix,$suffix]; } local $\ = "\n"; my @names = ('President Barack Hussein Obama II','Mrs. Amanda King','D +r. Feelsgood', 'Miss America','Officer Andy','Henry VIII', "Dr. John Smi +th", "Eucalyptus A. Tree, Esquire","Roboticus", "Lady Aleena, +Baker of cookies", 'Aleena Zarahlinda ibn Robert al-Hajnal Chaoshi-Mnemosyni +od I'); for my $person (@names) { my $names = name_transform($person); print join('|',@{$names}); }

returns

Obama|Barack|Hussein|President|II King|Amanda||Mrs.| Feelsgood|||Dr.| America|||Miss| Andy|Officer||| Henry||||VIII Smith|John||Dr.| Tree|Eucalyptus|A.||Esquire Roboticus|||| Aleena|||Lady|Baker of cookies Chaoshi-Mnemosyniod|Aleena|Zarahlinda ibn Robert al-Hajnal||I

What do you think?

Have a cookie and a very nice day!
Lady Aleena

Replies are listed 'Best First'.
Re^5: RFC: I rewrote a custom sort function, but not sure if it is the best way.
by choroba (Cardinal) on Mar 04, 2013 at 09:17 UTC
    What about Chiang Kai-shek? See Chinese name for more fun.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      choroba, there are so many rules about names I do not think I can cover them all. I did not cover van/von either. Depending on the country one where the name is being filed, van Helsing could be filed under H (Netherlands and Suriname) or V (Belgium). Van is also a Vietnamese middle name. The Dutch have a lot of surname prefixes which could be added to the surname. I am sure there are other rules I have missed. There are a whole slew of nobiliary particles each with their own sort rules I imagine.

      Have a cookie and a very nice day!
      Lady Aleena
Re^5: RFC: I rewrote a custom sort function, but not sure if it is the best way.
by roboticus (Chancellor) on Mar 04, 2013 at 18:22 UTC

    Lady_Aleena:

    It looks pretty good. I'm glad I was able to be helpful. Sometimes I suspect that I'm informative without being helpful. ;^)

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.