Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Getting abbreviations or initials

by hbm (Hermit)
on Aug 20, 2012 at 03:41 UTC ( #988384=note: print w/ replies, xml ) Need Help??


in reply to Getting abbreviations or initials

Three small suggestions, lightly tested:

#$name =~ s/^(The|A|An) //i; $name =~ s/^(?:The|An?) //i; # no capturing #for my $word (split(/[ _-]/,$name)) { # push @abbr, substr($word,0,1); #} my @abbr = $name =~ /(?:_|\b)(\w)/g; #my $raw_abbr = $opt{periods} && $opt{periods} =~ /^[yt1]/i ? join(' +',map { $_ =~ s/$/./; $_; } @abbr) : join('',@abbr); my $raw_abbr = $opt{periods} && $opt{periods} =~ /^[yt1]/i ? join('.',@abbr) . '.' : join('', @abbr);

Update:Thinking about it a bit more, I'd do all the [yt1] testing up front, and eliminate the intermediate variables:

use strict; use warnings; my %opt = ( periods => 0, ALLCAPS => 1, HTML => 1, name => "Compact Disc read-only memory", ); print abbr(%opt); sub abbr { my %opt = @_; $opt{name} =~ s/^(?:The|An?) //i; die("Sorry...") unless $opt{name} =~ /\S/; my %ON = map { $_ => 1 } grep { $opt{$_} =~ /^[yt1]$/i } keys %opt; return ($ON{HTML} ? qq{<abbr title="$opt{name}">} : "") . ( join '', map { $_ = $ON{ALLCAPS} ? uc : $_; $_ = $ON{periods} ? "$_." : $_; } $opt{name} =~ /(?:_|\b)(\w)/g ) . ($ON{HTML} ? qq{</abbr>} : "") }

And confession! Somehow, I did not know this worked:

/(The|A|An)/;

Sadly, I've always wrapped pipes in non-capturing parens, and wrapped it all in capturing parens:

/((?:The|A|An))/;


Comment on Re: Getting abbreviations or initials
Select or Download Code
Re^2: Getting abbreviations or initials
by Lady_Aleena (Deacon) on Aug 20, 2012 at 23:50 UTC

    Hello hbm. Thanks for taking time to tear this apart and show me where I can tighten things up.

    For /^(?:The|An?) / vs. /^(The|A|An) /, the only reason I can give you is I have not yet grokked extended patterns in perlre. I should stop capturing when all I want is a cluster to save memory. (The|A|An) is one of the first things I learned for writing regexes. I still have to force myself to use [] for single characters like [ _-] and [yt1] instead of () (( |_|-) and (y|t|1) respectively). Another thing, you not knowing that /(The|A|An)/ worked is far better than me not knowing how to use a whole section of perlre.

    For my @abbr = $name =~ /(?:_|\b)(\w)/g; vs. a for loop and substr, all I can say it that this began while I was teaching myself substr and helping someone else get it at the same time one really early morning. Until two days ago, this subroutine was a lot tinier.

    sub initials { my $name = shift; for my $word (split(/( |_)/,$name) { push @abbr, substr($word,0,1); } print join('',@abbr); }

    Two days ago I looked at it and decided to add a few things. Little things went through my head like...

    • What if the user wants periods after each initial?
    • What if the abbreviation is all caps in spite of the grammar rules making certain words lowercase in names and titles?
    • HTML has an abbr tag, so I'll just add it in just in case I want to use it in my HTML code later.

    Also, I did not know that I could use a regex like that to split a scalar into a list. Until now all I knew was split.

    For join('.',@abbr) . '.' vs. join('',map { $_ =~ s/$/./; $_; } @abbr), all I can say is that I overcomplicated it. I did think of join('.',@abbr) at first, then thought but that won't put a period at the end, I guess I'll have to map it. The idea of concatenating a period on the end of join('.',@abbr) did not even cross my mind. eeps.

    Now onto your update. I see that you are directly modifying $opt{name} to remove articles instead of assigning it to another variable. When I am modifying a variable with a regex, I almost always assign it to another variable first to preserve the original. If you are getting the HTML for the abbreviation of "The International House of Pancakes", in the title= part of the HTML, you might want the article to be there. Also, I am not seeing the single word test in your code. If I am abbreviating musicians names, I do not think I want Bono, Cher, Madonna, or Sting returned as B, C, M, or S; but I would want Olivia Newton-John returned as ONJ. Am I misreading it?

    I will update this post with other questions I may have. I need to study the code more.

    Have a cookie and a very nice day!
    Lady Aleena

      Ah, right you are about me not storing the original $opt{name}; nor returning it unchanged if it is a single word...

      And another trick, for getting that last period, is simply join('.',@abbr,'').

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://988384]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2014-12-28 06:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (178 votes), past polls