EdwardG has asked for the wisdom of the Perl Monks concerning the following question:
Recently I had the need to produce a report of a certain resource level over time. The basic report looked like this -
Date Range Level Consumers -------------------------- ------- -------------------------- 13 Mar 2004 to 23 Mar 2004 7 AB, AT, DA, MS, RI, RR, RR 24 Mar 2004 to 31 Mar 2004 6 AB, DA, RC, RI, RR, SJ 01 Apr 2004 to 15 May 2004 3 Alan Po, Rohan Ito, Sam Jones 16 May 2004 to 31 Dec 2004 5 AP, PS, RC, RI, SJ
What this shows is that for each listed _date_range_, we require _level_ items, and that these items are to be allocated to the _consumer_ as identified by the listed names. Where the list of names is too long to fit neatly on one line, I programmatically shorten the name to just the initials to try keep it from wrapping or truncating. Incidentally, this report has proven very usful as it shows if we will exceed or have spare capacity in the known future.
But this node is not about that report, it is about a frustrating little problem (did you spot it?) that eventually resulted in the code below.
I noticed that I had neglected to cater for two or more people having the same initials. In fact my code to generate intials from names was simply
$init{$_} = join '', ($_ =~ /\b\w/g) for @names;
People with the same initials make the report ambiguous, although this doesn't matter if they are both included in a single date range. But when a single "RR" appears in a date range, as in the second line of the report, it is ambiguous.
Now this was not a big problem since the report was more about the resource level than about the resource consumers, but naively I thought it was a simple fix.
My first lazy attempt at distinguishing derived initials was to detect when the derived initials were already used, and in that case add more letters from the name (iteratively) until the second (etc) set of derived initials were unique. This resulted in a report that looked like this -
Date Range Level Consumers -------------------------- ------- -------------------------- 13 Mar 2004 to 23 Mar 2004 7 AB, AT, DA, MS, RI, RR, RaRa 24 Mar 2004 to 31 Mar 2004 6 AB, DA, RC, RI, RaRa, SJ 01 Apr 2004 to 15 May 2004 3 Alan Po, Rohan Ito, Sam Jones 16 May 2004 to 31 Dec 2004 5 AP, PS, RC, RI, SJ
Notice "RaRa" is now used for the second person with initials of "RR". I can now see in the second line of the report, it is actually the second "RR" person who needs the resource, not the first like I could have mistakenly assumed.
This was an improvement, and in fact I left the report in this state since a) it was only for my information, b) it was good enough, and c) I had more important things to do.
At this point I could have abandoned names and maybe used numbers to identify people, but I avoided that option for two main reasons:
- I didn't want the maintenance hassle of a new item of data (where would these "id numbers" be stored? How would I maintain them?)
- I really wanted the report to tell me names without referring to yet another list. (The report was supposed to be simple, dammit)
Fast forward several months, I'm running this report again - by now it has many more people on the list - and once again I notice the ambiguity. With more time to spend on the problem, I dived in.
To cut the potentially long story short, I've now got the following code that takes a list of names and returns a hashref of name => initials.
Example:
Alphonse Romeo-Smith => ARS Big Al Bundy => BAB Dilbert => D I Palindrome I => IPI James O'Leary => JOL Jimmy O'Brien => JOB John Smith => JS K9 => K Medhi Majesh => MedMaj Mickey Mouse => MicMou Minnie Mouse => MinMou Tiny Tim => TT Tom Demarco => TomDem Tom Denada => TomDen
Notice the additional characters used to distinguish Mickey, Minnie, and Medhi
For names that are nearly identical, this doesn't work so well . o O(but what can I do?)
A A Milne => AAMilne A A Milner => AAMilner
With some more work, and this is the real reason I'm posting it, I wonder if it could become a module fit for sharing.
So with that in mind, I hereby invite your thoughts on the sensibility of what I'm trying to do, and the code itself (bugs or inefficiencies), and also perhaps you can think of names I haven't catered for. I'm already aware that this code would not be suitable for a long list of names (like a telephone book), but as it stands I find it useful for up to say several hundred names.
In case anyone is wondering, I got the long list of names from a Random Name Generator.
Here is the code. (I'm going out on a limb here, I expect someone will post an embarassing 80 character golfed version within minutes :)
use strict; use warnings; MAIN: { my @names = <DATA>; chomp @names; my $initref = &DistinctInitials( @names ); # Sorted by Name print "$_ => $$initref{$_}\n" for (sort keys %$initref); # Sorted by Initials for my $init (sort values %$initref) { print "$init => $_\n" for (grep {$$initref{$_} eq $init} keys +%$initref); } } exit; sub DistinctInitials { # Derive unique "initials" for each name in a list # 1. Get simple initials using the first letter of each word in na +me # 2. For non-unique initials, insert a sufficient quantity of lett +ers from # the original name to make the initial unique among all initia +ls my @names = @_; # A copy to be modified # capitalise words s/\b(\w)(\w+)\b/\u$1\L$2\E/g for @names; # remove duplicates my %seen; @names = grep { ! $seen{$_}++ } @names; # derive initial of each name my %init; $init{$_} = join '', ($_ =~ /\b\w/g) for @names; # identify non-unique initials my %nonu; for my $nonu_v (grep {$seen{$_}++} values %init) { map {push @{$nonu{$nonu_v}},$_ } grep {$init{$_} eq $nonu_v} k +eys %init; } # remove duplicates %seen=(); @{$nonu{$_}} = grep { ! $seen{$_} ++ } @{$nonu{$_}} for keys %nonu +; for my $init (keys %nonu) { # determine minimum additional characters from each # name that make all these initials unique my $c = 2; # start with one extra character my $notunique = 1; while ($notunique) { my %tryuniq; for my $name (@{$nonu{$init}}) { $tryuniq{ join '', ($name =~ /\b\w{1,$c}/g) } = $name; + # greedy {1,$c} } if (scalar keys %tryuniq == scalar @{$nonu{$init}}) { undef $notunique; # success, all initials are now uniq +ue # update the initial hash with our newfound initials while (my ($tryinit,$tryname) = each %tryuniq) { $init{$tryname} = $tryinit; } } else { # failed to make them all unique $c += 1; } } } \%init; } # Hand-crafted test names follows... __DATA__ John Smith Medhi Majesh Mickey Mouse Minnie Mouse A A Milne A A Milne A A Milner Jimmy O'Brien James O'Leary Dilbert Alphonse Romeo-Smith I Palindrome I tiny tim BIG AL BUNDY Tom DeMarco Tom DeNada K9
And here is further list of names I used for testing. As I mentioned, these are random names, any similarity to persons alive or dead is purely coincidental :)
Adam Harper Alan Pfaff Alan Vanwinkle Alana Stutsman Albert Osterman Alison Sands Allan Goris Allan Razor Alma Ratcliffe Alvaro Cangemi Amanda Basile Annabelle Ducker Anthony Granado Anthony Mcdevitt Antonio Starnes April Kain Arlene Stephen Armand Stower Arthur Ragsdale Ashlee Hurtt Ballentine Beata Zar Bernice Turnage Bertha Schuck Billy Sundberg Binns Blanche Page Bobby Eaton Bobby Winkelman Brandon Dominick Brittanie Noviello Brittian Bulah Twombley Candace Cote Carey Balck Carey Grishaber Carey Twedell Chad Guarino Chandra Puffer Cheryl Langley Chris Creekmore Clare Milum Clayton Elbert Clinton Dahmen Clinton Ganley Clinton Mcnaught Cody Ridout Cody Vigo Dale Hernandes Dane Hasychak Daniel Colunga Darren Tacey Darren Vanaman Darryl Buchta Debbie Schlueter Doggett Doris Adkins Eileen Bard Elinor Rita Elizabeth Hendricks Elnora Hornbuckle Elwood Steketee Ericka Hodgin Erik Duley Ernest Cowley Ernest Frey Ernest Helmick Esmeralda Kardos Eve Goodfellow Ewa Villerreal Felicia Manson Fernando Outler Fidel Jacoway Fidel Paone Florence Lillie Fred Pyles Genna Tranbarger George Lamoureux Gerald Graziano Gerald Minnich Gina Nickelson Gladys Rothman Glenda Wheat Gregory Walls Guinn Guy Gatton Hannah Quijano Harold Houser Harriet Dreyer Heinricher Howard Sirois Hugh Charland Hugh Dipalma Hugh Fuselier Hugh Minaya Ida Noe Inez Marion James Angell James Cravens Jami Gula Jamie Brickhouse Jamie Huot Jason Willson Jeannette Macaluso Jerri Giesen Jerri Houseknecht Jerri Nickson Jessie Olah Ji Greenier Jonathan Matos Joyce Brian Julio Sepeda Karina Palka Katherine Holzer Kathryn Diaz Katie Lloyd Keila Brue Keith Berner Kelly Henegar Kenneth Caraballo Kevin Batchelder Kevin Poindexter Kimberly Cecil Kimberly Hare Kingsbury Koteles Kristina Kasper Lakisha Trees Larita Battisti Laura Mcfadden Laurie Carranza Lilia Fouse Linda Kavanaugh Linda Power Lonnie Sherrow Lori Stjohn Lorrie Lobdell Louis Leggett Louis Watterson Louisa Galyean Louisa Swingle Louise Belanger Lynda Knudsen Madelene Robasciotti Mae Brantley Maggie Fernandez Maggie Madore Mallory Brodbeck Mallory Sickels Manuel Trotter Marcel Avers Marcel Cuffia Marcel Merow Margaret Amey Maricela Bautch Marilyn Shumpert Mark Haynie Marlin Gryniuk Martin King Mathew Mayton Max Sprau Merrilee Garelick Mildred Colburn Ming Negro Nathan Shuster Neil Helbert Nelson Pinzon Nelson Rayo Nicholas Lowell Nicole Tillman Norman Thatcher Olivia Bridgeman Peggy Loggins Penelope Maize Peter Agee Peter Valadez Priscilla Katz Rachel Marchand Randy Frederickson Raphael Dumaine Raymond Nava Robyn Duvall Rod Schoneman Roger Goldman Ronald Looney Roy Kushner Royce Schrock Scott Obryan Shawn Ault Stanley Cleaver Sydow Sylvia Salazar Sylvia Walther Tabatha Goodsell Tari Windish Tarra Bellantuono Thomas Coleman Tia Drakeford Tina Palomo Tompkins Tony Haight Tyrone Crossett Tyrone Golub Victor Mcduffie Vincent Nowak Viola Mcnamee Wayne Maye Wayne Ulrich Yolanda Hood
|
---|