Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

RegExp Capitalization of Entry

by bladx (Chaplain)
on Jul 16, 2002 at 15:22 UTC ( #182107=perlquestion: print w/replies, xml ) Need Help??
bladx has asked for the wisdom of the Perl Monks concerning the following question:

Hi minnasan,
I have a question regarding regular expressions and using regexp's to filter Music Lists (group, album title).

What I am trying to do is this ... for example I have the music entry of: "red hot chili peppers" (group), "by the way" (album title).

Now what I am trying to figure out is how to convert an entry like that to: "Red Hot Chili Peppers" (group), "By the Way" (album title). This means that there needs to be a way to capitalize the names of every word in each entry except for the words such as: "and", "or", "the" etc.

Is there a way I can do this in Perl? Thanks for any help.

Replies are listed 'Best First'.
Re: RegExp Capitalization of Entry
by talexb (Canon) on Jul 16, 2002 at 15:50 UTC
    My guess is that you want to use ucfirst and lc along with a map loop on the group name and album title.
    #!/usr/bin/perl -w # Title case except for some special words use strict; my %Exceptions = ( "and" => 1, "the" => 1, "or" => 1 ); while (<DATA>) { my @Results = map { ( defined ( $Exceptions{ $_ } ) ) ? $_ : ucfirst ( lc ( $_ ) ) } split ( /\s+/, $_ ); print join ( " ", @Results ) . "\n"; } __DATA__ red hot chili peppers by the way

    --t. alex

    "Mud, mud, glorious mud. Nothing quite like it for cooling the blood!"
    --Michael Flanders and Donald Swann

      I've done something like this before. I've changed talexb's code a little to account for things that you want to keep in special case, like "II" as in "Greatest Hits Vol. II".
      # Title case except for some special words use strict; my %Exceptions = ( "and" => "and", "the" => "the", "or" => "or", "zztop" => "ZZtop", "ii" => "II" ); while (<DATA>) { my @Results = map { (defined ( $Exceptions{lc($_)})) ? $Exceptions{lc($_)} : ucfirst ( lc ( $_ ) ) } split ( /\s+/, $_ ); #captialize first word regardless substr($Results[0],0,1, uc substr($Results[0],0,1)); print join ( " ", @Results ) . "\n"; } __DATA__ red hot chili peppers by the way zztop greatest hits vol. II the beatles the white album disc II __OUTPUT__ Red Hot Chili Peppers By the Way ZZtop Greatest Hits Vol. II The Beatles The White Album Disc II
      You will have to expand %Exceptions as you find/think of them.


      added code to capitalize first character of first word.



Re: RegExp Capitalization of Entry
by japhy (Canon) on Jul 16, 2002 at 16:06 UTC
    I'd do:
    ($str = lc $str) =~ s{ (?: ^ | \b (?! (?:and|an?|the|o[rfn]) \b ) ) (\w) }{\u$1}gx;
    This keeps "and", "an", "a", "the", "or", "of", and "on" in lowercase. Add to that as needed.

    I still hate Perl's regex engine. It cannot possibly match BOL (beginning of line) after the first character, so why the hell does it try?

    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: RegExp Capitalization of Entry
by Abigail-II (Bishop) on Jul 16, 2002 at 16:03 UTC
    From the perlfaq which comes with Perl:
    How do I capitalize all the words on one line? To make the first letter of each word upper case: $line =~ s/\b(\w)/\U$1/g; This has the strange effect of turning ""don't do it"" into ""Don'T Do It"". Sometimes you might want this. Other times you might need a more thorough solution (Sug- gested by brian d. foy): $string =~ s/ ( (^\w) #at the beginning of the line | # or (\s\w) #preceded by whitespace ) /\U$1/xg; $string =~ /([\w']+)/\u\L$1/g; To make the whole line upper case: $line = uc($line); To force each word to be lower case, with the first letter upper case: $line =~ s/(\w+)/\u\L$1/g; You can (and probably should) enable locale awareness of those characters by placing a "use locale" pragma in your program. See the perllocale manpage for endless details on locales. This is sometimes referred to as putting something into "title case", but that's not quite accurate. Consider the proper capitalization of the movie Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb, for example.
    How to exclude words like the and the like is left as an exercise to the reader.


Re: RegExp Capitalization of Entry
by jmcnamara (Monsignor) on Jul 16, 2002 at 16:07 UTC

    This should work for most cases (if you don't have an album by "The The"). ;-)
    #!/usr/bin/perl -w use strict; # List your exceptions here my @exceptions = qw(the and or); while (my $str = <DATA>) { print "Input: ", $str; # Substitute the text in quotes $str =~ s{(")([^"]+)(")} {$1 . join('', map ucfirst, split /(\s+|[-])/, $2) . +$3}eg; # lc the exceptions that don't start a title $str =~ s/ $_\b/ $_/gi for @exceptions; print "Output: ", $str, "\n"; } __DATA__ "red hot chili peppers" (group), "by the way" (album title). "the go-betweens" (group), "spring hill fair" (album title). "jonathan richman" (group), "i, jonathan" (album title).
    This prints:
    Input: "red hot chili peppers" (group), "by the way" (album title +). Output: "Red Hot Chili Peppers" (group), "By the Way" (album title +). Input: "the go-betweens" (group), "spring hill fair" (album title +). Output: "The Go-Betweens" (group), "Spring Hill Fair" (album title +). Input: "jonathan richman" (group), "i, jonathan" (album title). Output: "Jonathan Richman" (group), "I, Jonathan" (album title).


Re: RegExp Capitalization of Entry
by Sidhekin (Priest) on Jul 16, 2002 at 15:58 UTC

    ... a way to capitalize the names of every word in each entry except for the words such as: "and", "or", "the" etc.

    I have a feeling there has to be a module for this somewhere, but it is really not that hard, if you just have a list of words that should not be upcased.

    Well, there is also the question of what constitutes a word ... this is just one of many ways. Season to taste:

    print capitalize("red hot chili peppers\nby the way\n"); { my %exception; sub capitalize { my $string = shift; %exception = map{$_=>1}qw(and or the a an etc) unless keys %exception; $string =~ s/(\w+)/$exception{$1}?$1:ucfirst($1)/ge; return $string; } }

    The Sidhekin
    print "Just another Perl ${\(trickster and hacker)},"

Re: RegExp Capitalization of Entry
by insensate (Hermit) on Jul 16, 2002 at 16:13 UTC
    Here is a way to do it with one regex:
    while(<DATA>){ @words=split; for(@words){ /(?:(the|or|and)|\w+) #Capture "the|or|and" (?(1) #Switch on captured value (?{print"$_ "}) #If there is a captured value just print |(?{print "\u\L$_ "}))#If not convert first char uppercase /x; } } __DATA__ red hot chili peppers, by the way
    Hope this helps,

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://182107]
Approved by talexb
[marto]: Wolfsbane , now I'm having flashbacks
[choroba]: Isn't Using PerlPod Creatively rather a meditation?
[choroba]: I don't see a question
[1nickt]: ugh, I stuck my head in the bass bin for 30 seconds on a dare at Ted Nugent at Hammersmith Odeon. Yes, I am 40% deaf now.
[johngg]: My daughter is incredibly jealous of my wife who got to see The Clash at Brixton many years ago. They went to see The Vaccines there recently.
[1nickt]: But the bands are even louder! I saw Spearhead (Michael Franti) at an outdoor show and had to walk a mile away to not feel pain in my chest! Babies were crying ... I asked the sound engineer why it was necessary to have the bass so loud and he laughed...
[Discipulus]: but the best i attended live was Mano Negra Patchanka at Forte Prenestino .. in 1990
[Corion]: Hmmm - Mano Negra or at least Manu Chao seem to put on a good live show. At least the one live CD I have from Manu Chao sounds good ;)
Discipulus feels the same jealousity of the johngg's daughter

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (11)
As of 2017-03-24 12:15 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (301 votes). Check out past polls.