Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Getting abbreviations or initials

by Lady_Aleena (Chaplain)
on Aug 20, 2012 at 23:50 UTC ( #988532=note: print w/ replies, xml ) Need Help??


in reply to Re: Getting abbreviations or initials
in thread Getting abbreviations or initials

Hello hbm. Thanks for taking time to tear this apart and show me where I can tighten things up.

For /^(?:The|An?) / vs. /^(The|A|An) /, the only reason I can give you is I have not yet grokked extended patterns in perlre. I should stop capturing when all I want is a cluster to save memory. (The|A|An) is one of the first things I learned for writing regexes. I still have to force myself to use [] for single characters like [ _-] and [yt1] instead of () (( |_|-) and (y|t|1) respectively). Another thing, you not knowing that /(The|A|An)/ worked is far better than me not knowing how to use a whole section of perlre.

For my @abbr = $name =~ /(?:_|\b)(\w)/g; vs. a for loop and substr, all I can say it that this began while I was teaching myself substr and helping someone else get it at the same time one really early morning. Until two days ago, this subroutine was a lot tinier.

sub initials { my $name = shift; for my $word (split(/( |_)/,$name) { push @abbr, substr($word,0,1); } print join('',@abbr); }

Two days ago I looked at it and decided to add a few things. Little things went through my head like...

  • What if the user wants periods after each initial?
  • What if the abbreviation is all caps in spite of the grammar rules making certain words lowercase in names and titles?
  • HTML has an abbr tag, so I'll just add it in just in case I want to use it in my HTML code later.

Also, I did not know that I could use a regex like that to split a scalar into a list. Until now all I knew was split.

For join('.',@abbr) . '.' vs. join('',map { $_ =~ s/$/./; $_; } @abbr), all I can say is that I overcomplicated it. I did think of join('.',@abbr) at first, then thought but that won't put a period at the end, I guess I'll have to map it. The idea of concatenating a period on the end of join('.',@abbr) did not even cross my mind. eeps.

Now onto your update. I see that you are directly modifying $opt{name} to remove articles instead of assigning it to another variable. When I am modifying a variable with a regex, I almost always assign it to another variable first to preserve the original. If you are getting the HTML for the abbreviation of "The International House of Pancakes", in the title= part of the HTML, you might want the article to be there. Also, I am not seeing the single word test in your code. If I am abbreviating musicians names, I do not think I want Bono, Cher, Madonna, or Sting returned as B, C, M, or S; but I would want Olivia Newton-John returned as ONJ. Am I misreading it?

I will update this post with other questions I may have. I need to study the code more.

Have a cookie and a very nice day!
Lady Aleena


Comment on Re^2: Getting abbreviations or initials
Select or Download Code
Re^3: Getting abbreviations or initials
by hbm (Hermit) on Aug 21, 2012 at 00:54 UTC

    Ah, right you are about me not storing the original $opt{name}; nor returning it unchanged if it is a single word...

    And another trick, for getting that last period, is simply join('.',@abbr,'').

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://988532]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2014-10-01 11:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (9 votes), past polls