bladx has asked for the wisdom of the Perl Monks concerning the following question:
Hi minnasan,
I have a question regarding regular expressions and using regexp's to filter Music Lists (group, album title).
What I am trying to do is this ... for example I have the music entry of: "red hot chili peppers" (group), "by the way" (album title).
Now what I am trying to figure out is how to convert an entry like that to: "Red Hot Chili Peppers" (group), "By the Way" (album title). This means that there needs to be a way to capitalize the names of every word in each entry except for the words such as: "and", "or", "the" etc.
Is there a way I can do this in Perl? Thanks for any help.
Re: RegExp Capitalization of Entry
by talexb (Chancellor) on Jul 16, 2002 at 15:50 UTC
|
My guess is that you want to use ucfirst and lc along with a map loop on the group name and album title.
#!/usr/bin/perl -w
# Title case except for some special words
use strict;
my %Exceptions = ( "and" => 1, "the" => 1, "or" => 1 );
while (<DATA>)
{
my @Results = map { ( defined ( $Exceptions{ $_ } ) ) ?
$_ : ucfirst ( lc ( $_ ) ) } split ( /\s+/, $_ );
print join ( " ", @Results ) . "\n";
}
__DATA__
red hot chili peppers
by the way
--t. alex
"Mud, mud, glorious mud. Nothing quite like it for cooling the blood!"
--Michael Flanders and Donald Swann
| [reply] [d/l] |
|
I've done something like this before. I've changed talexb's code a little to account for things that you want to keep in special case, like "II" as in "Greatest Hits Vol. II".
# Title case except for some special words
use strict;
my %Exceptions = (
"and" => "and",
"the" => "the",
"or" => "or",
"zztop" => "ZZtop",
"ii" => "II" );
while (<DATA>)
{
my @Results = map { (defined ( $Exceptions{lc($_)})) ?
$Exceptions{lc($_)} : ucfirst ( lc ( $_ ) ) } split ( /\s+/, $_ );
#captialize first word regardless
substr($Results[0],0,1, uc substr($Results[0],0,1));
print join ( " ", @Results ) . "\n";
}
__DATA__
red hot chili peppers
by the way
zztop
greatest hits vol. II
the beatles
the white album disc II
__OUTPUT__
Red Hot Chili Peppers
By the Way
ZZtop
Greatest Hits Vol. II
The Beatles
The White Album Disc II
You will have to expand %Exceptions as you find/think of them.
__UPDATE__
added code to capitalize first character of first word.
-- flounder | [reply] [d/l] [select] |
Re: RegExp Capitalization of Entry
by japhy (Canon) on Jul 16, 2002 at 16:06 UTC
|
($str = lc $str) =~ s{
(?: ^ | \b (?! (?:and|an?|the|o[rfn]) \b ) )
(\w)
}{\u$1}gx;
This keeps "and", "an", "a", "the", "or", "of", and "on" in lowercase. Add to that as needed.
I still hate Perl's regex engine. It cannot possibly match BOL (beginning of line) after the first character, so why the hell does it try?
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; | [reply] [d/l] |
Re: RegExp Capitalization of Entry
by Abigail-II (Bishop) on Jul 16, 2002 at 16:03 UTC
|
From the perlfaq which comes with Perl:
How do I capitalize all the words on one line?
To make the first letter of each word upper case:
$line =~ s/\b(\w)/\U$1/g;
This has the strange effect of turning ""don't do it""
into ""Don'T Do It"". Sometimes you might want this.
Other times you might need a more thorough solution (Sug-
gested by brian d. foy):
$string =~ s/ (
(^\w) #at the beginning of the line
| # or
(\s\w) #preceded by whitespace
)
/\U$1/xg;
$string =~ /([\w']+)/\u\L$1/g;
To make the whole line upper case:
$line = uc($line);
To force each word to be lower case, with the first letter
upper case:
$line =~ s/(\w+)/\u\L$1/g;
You can (and probably should) enable locale awareness of
those characters by placing a "use locale" pragma in your
program. See the perllocale manpage for endless details
on locales.
This is sometimes referred to as putting something into
"title case", but that's not quite accurate. Consider the
proper capitalization of the movie Dr. Strangelove or: How
I Learned to Stop Worrying and Love the Bomb, for example.
How to exclude words like the and the like is
left as an exercise to the reader.
Abigail | [reply] [d/l] [select] |
Re: RegExp Capitalization of Entry
by jmcnamara (Monsignor) on Jul 16, 2002 at 16:07 UTC
|
This should work for most cases (if you don't have an album by "The The"). ;-)
#!/usr/bin/perl -w
use strict;
# List your exceptions here
my @exceptions = qw(the and or);
while (my $str = <DATA>) {
print "Input: ", $str;
# Substitute the text in quotes
$str =~ s{(")([^"]+)(")}
{$1 . join('', map ucfirst, split /(\s+|[-])/, $2) .
+$3}eg;
# lc the exceptions that don't start a title
$str =~ s/ $_\b/ $_/gi for @exceptions;
print "Output: ", $str, "\n";
}
__DATA__
"red hot chili peppers" (group), "by the way" (album title).
"the go-betweens" (group), "spring hill fair" (album title).
"jonathan richman" (group), "i, jonathan" (album title).
This prints:
Input: "red hot chili peppers" (group), "by the way" (album title
+).
Output: "Red Hot Chili Peppers" (group), "By the Way" (album title
+).
Input: "the go-betweens" (group), "spring hill fair" (album title
+).
Output: "The Go-Betweens" (group), "Spring Hill Fair" (album title
+).
Input: "jonathan richman" (group), "i, jonathan" (album title).
Output: "Jonathan Richman" (group), "I, Jonathan" (album title).
--
John.
| [reply] [d/l] [select] |
Re: RegExp Capitalization of Entry
by Sidhekin (Priest) on Jul 16, 2002 at 15:58 UTC
|
... a way to capitalize the names of every word in each entry except for the words such as: "and", "or", "the" etc.
I have a feeling there has to be a module for this
somewhere, but it is really not that hard, if you just
have a list of words that should not be upcased.
Well, there is also the question of what constitutes
a word ... this is just one of many ways. Season to taste:
print capitalize("red hot chili peppers\nby the way\n");
{
my %exception;
sub capitalize {
my $string = shift;
%exception = map{$_=>1}qw(and or the a an etc)
unless keys %exception;
$string =~ s/(\w+)/$exception{$1}?$1:ucfirst($1)/ge;
return $string;
}
}
The Sidhekin
print "Just another Perl ${\(trickster and hacker)}," | [reply] [d/l] [select] |
Re: RegExp Capitalization of Entry
by insensate (Hermit) on Jul 16, 2002 at 16:13 UTC
|
Here is a way to do it with one regex:
while(<DATA>){
@words=split;
for(@words){
/(?:(the|or|and)|\w+) #Capture "the|or|and"
(?(1) #Switch on captured value
(?{print"$_ "}) #If there is a captured value just print
|(?{print "\u\L$_ "}))#If not convert first char uppercase
/x;
}
}
__DATA__
red hot chili peppers, by the way
Hope this helps, Jason
| [reply] [d/l] |
|
|