Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

RegExp Capitalization of Entry

by bladx (Chaplain)
on Jul 16, 2002 at 15:22 UTC ( #182107=perlquestion: print w/ replies, xml ) Need Help??
bladx has asked for the wisdom of the Perl Monks concerning the following question:

Hi minnasan,
I have a question regarding regular expressions and using regexp's to filter Music Lists (group, album title).

What I am trying to do is this ... for example I have the music entry of: "red hot chili peppers" (group), "by the way" (album title).

Now what I am trying to figure out is how to convert an entry like that to: "Red Hot Chili Peppers" (group), "By the Way" (album title). This means that there needs to be a way to capitalize the names of every word in each entry except for the words such as: "and", "or", "the" etc.

Is there a way I can do this in Perl? Thanks for any help.

Comment on RegExp Capitalization of Entry
Re: RegExp Capitalization of Entry
by talexb (Canon) on Jul 16, 2002 at 15:50 UTC
    My guess is that you want to use ucfirst and lc along with a map loop on the group name and album title.
    #!/usr/bin/perl -w # Title case except for some special words use strict; my %Exceptions = ( "and" => 1, "the" => 1, "or" => 1 ); while (<DATA>) { my @Results = map { ( defined ( $Exceptions{ $_ } ) ) ? $_ : ucfirst ( lc ( $_ ) ) } split ( /\s+/, $_ ); print join ( " ", @Results ) . "\n"; } __DATA__ red hot chili peppers by the way

    --t. alex

    "Mud, mud, glorious mud. Nothing quite like it for cooling the blood!"
    --Michael Flanders and Donald Swann

      I've done something like this before. I've changed talexb's code a little to account for things that you want to keep in special case, like "II" as in "Greatest Hits Vol. II".
      # Title case except for some special words use strict; my %Exceptions = ( "and" => "and", "the" => "the", "or" => "or", "zztop" => "ZZtop", "ii" => "II" ); while (<DATA>) { my @Results = map { (defined ( $Exceptions{lc($_)})) ? $Exceptions{lc($_)} : ucfirst ( lc ( $_ ) ) } split ( /\s+/, $_ ); #captialize first word regardless substr($Results[0],0,1, uc substr($Results[0],0,1)); print join ( " ", @Results ) . "\n"; } __DATA__ red hot chili peppers by the way zztop greatest hits vol. II the beatles the white album disc II __OUTPUT__ Red Hot Chili Peppers By the Way ZZtop Greatest Hits Vol. II The Beatles The White Album Disc II
      You will have to expand %Exceptions as you find/think of them.

      __UPDATE__

      added code to capitalize first character of first word.

      --

      flounder

Re: RegExp Capitalization of Entry
by Sidhekin (Priest) on Jul 16, 2002 at 15:58 UTC

    ... a way to capitalize the names of every word in each entry except for the words such as: "and", "or", "the" etc.

    I have a feeling there has to be a module for this somewhere, but it is really not that hard, if you just have a list of words that should not be upcased.

    Well, there is also the question of what constitutes a word ... this is just one of many ways. Season to taste:

    print capitalize("red hot chili peppers\nby the way\n"); { my %exception; sub capitalize { my $string = shift; %exception = map{$_=>1}qw(and or the a an etc) unless keys %exception; $string =~ s/(\w+)/$exception{$1}?$1:ucfirst($1)/ge; return $string; } }

    The Sidhekin
    print "Just another Perl ${\(trickster and hacker)},"

Re: RegExp Capitalization of Entry
by Abigail-II (Bishop) on Jul 16, 2002 at 16:03 UTC
    From the perlfaq which comes with Perl:
    How do I capitalize all the words on one line? To make the first letter of each word upper case: $line =~ s/\b(\w)/\U$1/g; This has the strange effect of turning ""don't do it"" into ""Don'T Do It"". Sometimes you might want this. Other times you might need a more thorough solution (Sug- gested by brian d. foy): $string =~ s/ ( (^\w) #at the beginning of the line | # or (\s\w) #preceded by whitespace ) /\U$1/xg; $string =~ /([\w']+)/\u\L$1/g; To make the whole line upper case: $line = uc($line); To force each word to be lower case, with the first letter upper case: $line =~ s/(\w+)/\u\L$1/g; You can (and probably should) enable locale awareness of those characters by placing a "use locale" pragma in your program. See the perllocale manpage for endless details on locales. This is sometimes referred to as putting something into "title case", but that's not quite accurate. Consider the proper capitalization of the movie Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb, for example.
    How to exclude words like the and the like is left as an exercise to the reader.

    Abigail

Re: RegExp Capitalization of Entry
by japhy (Canon) on Jul 16, 2002 at 16:06 UTC
    I'd do:
    ($str = lc $str) =~ s{ (?: ^ | \b (?! (?:and|an?|the|o[rfn]) \b ) ) (\w) }{\u$1}gx;
    This keeps "and", "an", "a", "the", "or", "of", and "on" in lowercase. Add to that as needed.

    I still hate Perl's regex engine. It cannot possibly match BOL (beginning of line) after the first character, so why the hell does it try?

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: RegExp Capitalization of Entry
by jmcnamara (Monsignor) on Jul 16, 2002 at 16:07 UTC

    This should work for most cases (if you don't have an album by "The The"). ;-)
    #!/usr/bin/perl -w use strict; # List your exceptions here my @exceptions = qw(the and or); while (my $str = <DATA>) { print "Input: ", $str; # Substitute the text in quotes $str =~ s{(")([^"]+)(")} {$1 . join('', map ucfirst, split /(\s+|[-])/, $2) . +$3}eg; # lc the exceptions that don't start a title $str =~ s/ $_\b/ $_/gi for @exceptions; print "Output: ", $str, "\n"; } __DATA__ "red hot chili peppers" (group), "by the way" (album title). "the go-betweens" (group), "spring hill fair" (album title). "jonathan richman" (group), "i, jonathan" (album title).
    This prints:
    Input: "red hot chili peppers" (group), "by the way" (album title +). Output: "Red Hot Chili Peppers" (group), "By the Way" (album title +). Input: "the go-betweens" (group), "spring hill fair" (album title +). Output: "The Go-Betweens" (group), "Spring Hill Fair" (album title +). Input: "jonathan richman" (group), "i, jonathan" (album title). Output: "Jonathan Richman" (group), "I, Jonathan" (album title).

    --
    John.

Re: RegExp Capitalization of Entry
by insensate (Hermit) on Jul 16, 2002 at 16:13 UTC
    Here is a way to do it with one regex:
    while(<DATA>){ @words=split; for(@words){ /(?:(the|or|and)|\w+) #Capture "the|or|and" (?(1) #Switch on captured value (?{print"$_ "}) #If there is a captured value just print |(?{print "\u\L$_ "}))#If not convert first char uppercase /x; } } __DATA__ red hot chili peppers, by the way
    Hope this helps,
    Jason

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://182107]
Approved by talexb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (14)
As of 2014-08-27 20:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (252 votes), past polls