Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: Getting abbreviations or initials

by Lady_Aleena (Deacon)
on Aug 20, 2012 at 23:50 UTC ( #988532=note: print w/ replies, xml ) Need Help??


in reply to Re: Getting abbreviations or initials
in thread Getting abbreviations or initials

Hello hbm. Thanks for taking time to tear this apart and show me where I can tighten things up.

For /^(?:The|An?) / vs. /^(The|A|An) /, the only reason I can give you is I have not yet grokked extended patterns in perlre. I should stop capturing when all I want is a cluster to save memory. (The|A|An) is one of the first things I learned for writing regexes. I still have to force myself to use [] for single characters like [ _-] and [yt1] instead of () (( |_|-) and (y|t|1) respectively). Another thing, you not knowing that /(The|A|An)/ worked is far better than me not knowing how to use a whole section of perlre.

For my @abbr = $name =~ /(?:_|\b)(\w)/g; vs. a for loop and substr, all I can say it that this began while I was teaching myself substr and helping someone else get it at the same time one really early morning. Until two days ago, this subroutine was a lot tinier.

sub initials { my $name = shift; for my $word (split(/( |_)/,$name) { push @abbr, substr($word,0,1); } print join('',@abbr); }

Two days ago I looked at it and decided to add a few things. Little things went through my head like...

  • What if the user wants periods after each initial?
  • What if the abbreviation is all caps in spite of the grammar rules making certain words lowercase in names and titles?
  • HTML has an abbr tag, so I'll just add it in just in case I want to use it in my HTML code later.

Also, I did not know that I could use a regex like that to split a scalar into a list. Until now all I knew was split.

For join('.',@abbr) . '.' vs. join('',map { $_ =~ s/$/./; $_; } @abbr), all I can say is that I overcomplicated it. I did think of join('.',@abbr) at first, then thought but that won't put a period at the end, I guess I'll have to map it. The idea of concatenating a period on the end of join('.',@abbr) did not even cross my mind. eeps.

Now onto your update. I see that you are directly modifying $opt{name} to remove articles instead of assigning it to another variable. When I am modifying a variable with a regex, I almost always assign it to another variable first to preserve the original. If you are getting the HTML for the abbreviation of "The International House of Pancakes", in the title= part of the HTML, you might want the article to be there. Also, I am not seeing the single word test in your code. If I am abbreviating musicians names, I do not think I want Bono, Cher, Madonna, or Sting returned as B, C, M, or S; but I would want Olivia Newton-John returned as ONJ. Am I misreading it?

I will update this post with other questions I may have. I need to study the code more.

Have a cookie and a very nice day!
Lady Aleena


Comment on Re^2: Getting abbreviations or initials
Select or Download Code
Replies are listed 'Best First'.
Re^3: Getting abbreviations or initials
by hbm (Hermit) on Aug 21, 2012 at 00:54 UTC

    Ah, right you are about me not storing the original $opt{name}; nor returning it unchanged if it is a single word...

    And another trick, for getting that last period, is simply join('.',@abbr,'').

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://988532]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (9)
As of 2015-07-08 07:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (96 votes), past polls