Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Getting abbreviations or initials

by Lady_Aleena (Priest)
on Aug 19, 2012 at 21:21 UTC ( [id://988360]=CUFP: print w/replies, xml ) Need Help??

This little script returns the abbreviation of a name. The only required option to use this is name. The script will die if name is not initialized.

my $abbreviation = abbr( name => "Jane Q. Public", );

This script will strip off articles at the beginning of the string input. "The Lord of the Rings" will return "LotR". If only one word is entered or left after the initial article in the string, the word will be returned. "Cher" will return "Cher"; "The Police" will return "The Police". Strings will be broken at spaces, underscores, hyphens, or a combination thereof. "Compact Disc read-only memory" will be returned as "CDrom", see the options below for how to return "CDROM".

There are three options that are not required.

periods

If you want periods after each letter in the abbreviation, the option periods can be yes, true, or 1.

my $abbreviation = abbr( name => "Jane Q. Public", periods => "yes", );

ALLCAPS

If you want all of the letters in the abbreviation to be capitalized, the option ALLCAPS can be yes, true, or 1.

my $abbreviation = abbr( name => "The International House of Pancakes", ALLCAPS => "yes", );

HTML

If you want the output to be wrapped in the HTML tag, the option HTML can be yes, true, or 1. I added this because, as some here know, I use Perl to output a lot of HTML. :)

my $abbreviation = abbr( name => "frequently asked question", ALLCAPs => "yes", HTML => "yes", );

That will output:

<abbr title="frequently asked question">FAQ</abbr>

Alternate name

I included an alternate subroutine called initials which is the same as abbr.

Now here is the code...

sub abbr { my %opt = @_; die("Sorry, I can't return an abbreviation if you don't give me a na +me.") if !$opt{name}; my $name = $opt{name}; $name =~ s/^(The|A|An) //i; if ($name !~ /[ _-]/) { return $opt{name}; } else { my @abbr; for my $word (split(/[ _-]/,$name)) { push @abbr, substr($word,0,1); } my $raw_abbr = $opt{periods} && $opt{periods} =~ /^[yt1]/i ? join( +'',map { $_ =~ s/$/./; $_; } @abbr) : join('',@abbr); my $final_abbr = $opt{ALLCAPS} && $opt{ALLCAPS} =~ /^[yt1]/i ? uc +$raw_abbr : $raw_abbr; if ($opt{HTML} && $opt{HTML} =~ /^[yt1]/i) { return qq(<abbr title="$opt{name}">$final_abbr</abbr>); } else { return $final_abbr; } } } sub initials { my %opt = @_; my $initials = abbr( name => $opt{name} ? $opt{name} : die("Sorry, I can't return initi +als if you don't give me a name."), periods => $opt{periods} ? $opt{periods} : undef, ALLCAPS => $opt{ALLCAPS} ? $opt{ALLCAPS} : undef, HTML => $opt{HTML} ? $opt{HTML} : undef, ); return $initials; }

As always, I would love to know where I can clean this up.

Have a cookie and a very nice day!
Lady Aleena

Replies are listed 'Best First'.
Re: Getting abbreviations or initials
by CountZero (Bishop) on Aug 20, 2012 at 09:46 UTC
    Why don't you write the initials sub simply as:
    sub initials { return &abbr; }
    From the docs (perlsub):
    If a subroutine is called using the & form, the argument list is optional, and if omitted, no @_ array is set up for the subroutine: the @_ array at the time of the call is visible to subroutine instead.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

      blushes I didn't know about using the & form in that way. There are still many things like that I do not know about Perl. Thank you for showing me.

      Have a cookie and a very nice day!
      Lady Aleena
Re: Getting abbreviations or initials
by hbm (Hermit) on Aug 20, 2012 at 03:41 UTC

    Three small suggestions, lightly tested:

    #$name =~ s/^(The|A|An) //i; $name =~ s/^(?:The|An?) //i; # no capturing #for my $word (split(/[ _-]/,$name)) { # push @abbr, substr($word,0,1); #} my @abbr = $name =~ /(?:_|\b)(\w)/g; #my $raw_abbr = $opt{periods} && $opt{periods} =~ /^[yt1]/i ? join(' +',map { $_ =~ s/$/./; $_; } @abbr) : join('',@abbr); my $raw_abbr = $opt{periods} && $opt{periods} =~ /^[yt1]/i ? join('.',@abbr) . '.' : join('', @abbr);

    Update:Thinking about it a bit more, I'd do all the [yt1] testing up front, and eliminate the intermediate variables:

    use strict; use warnings; my %opt = ( periods => 0, ALLCAPS => 1, HTML => 1, name => "Compact Disc read-only memory", ); print abbr(%opt); sub abbr { my %opt = @_; $opt{name} =~ s/^(?:The|An?) //i; die("Sorry...") unless $opt{name} =~ /\S/; my %ON = map { $_ => 1 } grep { $opt{$_} =~ /^[yt1]$/i } keys %opt; return ($ON{HTML} ? qq{<abbr title="$opt{name}">} : "") . ( join '', map { $_ = $ON{ALLCAPS} ? uc : $_; $_ = $ON{periods} ? "$_." : $_; } $opt{name} =~ /(?:_|\b)(\w)/g ) . ($ON{HTML} ? qq{</abbr>} : "") }

    And confession! Somehow, I did not know this worked:

    /(The|A|An)/;

    Sadly, I've always wrapped pipes in non-capturing parens, and wrapped it all in capturing parens:

    /((?:The|A|An))/;

      Hello hbm. Thanks for taking time to tear this apart and show me where I can tighten things up.

      For /^(?:The|An?) / vs. /^(The|A|An) /, the only reason I can give you is I have not yet grokked extended patterns in perlre. I should stop capturing when all I want is a cluster to save memory. (The|A|An) is one of the first things I learned for writing regexes. I still have to force myself to use [] for single characters like [ _-] and [yt1] instead of () (( |_|-) and (y|t|1) respectively). Another thing, you not knowing that /(The|A|An)/ worked is far better than me not knowing how to use a whole section of perlre.

      For my @abbr = $name =~ /(?:_|\b)(\w)/g; vs. a for loop and substr, all I can say it that this began while I was teaching myself substr and helping someone else get it at the same time one really early morning. Until two days ago, this subroutine was a lot tinier.

      sub initials { my $name = shift; for my $word (split(/( |_)/,$name) { push @abbr, substr($word,0,1); } print join('',@abbr); }

      Two days ago I looked at it and decided to add a few things. Little things went through my head like...

      • What if the user wants periods after each initial?
      • What if the abbreviation is all caps in spite of the grammar rules making certain words lowercase in names and titles?
      • HTML has an abbr tag, so I'll just add it in just in case I want to use it in my HTML code later.

      Also, I did not know that I could use a regex like that to split a scalar into a list. Until now all I knew was split.

      For join('.',@abbr) . '.' vs. join('',map { $_ =~ s/$/./; $_; } @abbr), all I can say is that I overcomplicated it. I did think of join('.',@abbr) at first, then thought but that won't put a period at the end, I guess I'll have to map it. The idea of concatenating a period on the end of join('.',@abbr) did not even cross my mind. eeps.

      Now onto your update. I see that you are directly modifying $opt{name} to remove articles instead of assigning it to another variable. When I am modifying a variable with a regex, I almost always assign it to another variable first to preserve the original. If you are getting the HTML for the abbreviation of "The International House of Pancakes", in the title= part of the HTML, you might want the article to be there. Also, I am not seeing the single word test in your code. If I am abbreviating musicians names, I do not think I want Bono, Cher, Madonna, or Sting returned as B, C, M, or S; but I would want Olivia Newton-John returned as ONJ. Am I misreading it?

      I will update this post with other questions I may have. I need to study the code more.

      Have a cookie and a very nice day!
      Lady Aleena

        Ah, right you are about me not storing the original $opt{name}; nor returning it unchanged if it is a single word...

        And another trick, for getting that last period, is simply join('.',@abbr,'').

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://988360]
Approved by ww
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (5)
As of 2024-04-26 07:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found