starX has asked for the wisdom of the Perl Monks concerning the following question:
Most esteemed monks, I seek your wisdom in finding a more elegant solution to a problem I'm working on. Given a string:
Enter Iago, Othello, and others
I want to extract "Iago" and "Othello" to a data structure. My solution is as follows:
use strict;
use warnings;
# I'm actually reading this from another source, but am hand codin
+g
# the string here for demonstration purposes.
my $char_list = "Iago, Othello, and others";
my @words = split /\W/, $char_list;
foreach my $word (@words) {
if ($word =~ m/[A-Z]\w+/) {
my @entering_chars;
push @entering_chars, $word;
}
}
My present solution works, but it seems like I'm taking a lot of unnecessary steps to get there. If anyone would care to explain how to do this with a regex, or some other method less dependent on a loop, I would much appreciate it.
Update: correction. I'm not looking to capture "Enter," but it's also been split off from the string by the time I get here.
Re: Seeking a better way to do it
by BrowserUk (Patriarch) on Feb 01, 2013 at 00:39 UTC
|
@words = 'Enter Iago, Othello, and others' =~ m[(\b[A-Z]\w+\b)]g;;
print @words;;
Enter Iago Othello
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: Seeking a better way to do it
by AnomalousMonk (Archbishop) on Feb 01, 2013 at 01:25 UTC
|
A few approaches, some already covered by others:
-
The split approach produces a lot of 'noise' (even when using a more reasonable /\W+/ split pattern) that must be removed with further processing.
-
A tricky approach is to try to figure out just what a 'player' is and define a regex to extract those substrings.
-
Maybe the easiest and most reliable approach is to go to the dramatis personae list at the beginning of the play, look at all the players found there, and make a regex of that.
>perl -wMstrict -le
"my $char_list = 'Exit Cassio; Enter Iago, Othello, and others';
;;
my @words = split /\W+/, $char_list;
printf qq{'$_' } for @words;
print '';
;;
;;
my $not_player = qr{ (?! Enter | Exit) }xms;
my $player = qr{ \b $not_player [[:upper:]] [[:lower:]]+ }xms;
;;
my @players = $char_list =~ m{ $player }xmsg;
printf qq{'$_' } for @players;
print '';
;;
;;
my @dramatis_personae = qw(Cassio Iago Othello);
my ($character) =
map qr{ \b (?: $_) \b }xms,
join '|',
@dramatis_personae
;
;;
@players = $char_list =~ m{ $character }xmsg;
printf qq{'$_' } for @players;
"
'Exit' 'Cassio' 'Enter' 'Iago' 'Othello' 'and' 'others'
'Cassio' 'Iago' 'Othello'
'Cassio' 'Iago' 'Othello'
| [reply] [d/l] [select] |
Re: Seeking a better way to do it
by vinoth.ree (Monsignor) on Feb 01, 2013 at 00:15 UTC
|
my $char_list = "Enter Iago, Othello, and others";
my @Word_List = grep { /[A-Z]\w+/ } split(/\W/, $char_list);
print "@Word_List\n";
Update:
Simple way with regular expression,
my $char_list = "Enter Iago, Othello, and others";
my @Word_List;
@Word_List = ($char_list =~ /([A-Z]\w+)/g);
print "@Word_List\n";
| [reply] [d/l] [select] |
|
| [reply] [d/l] [select] |
|
This works assuming OP also wants to return 'Enter'. This is left out of the requirements, but I'm not sure if it is an oversight or not.
| [reply] |
Re: Seeking a better way to do it
by 7stud (Deacon) on Feb 01, 2013 at 00:37 UTC
|
use strict;
use warnings;
use 5.012; #for say()
my $text = "Enter Iago, Othello, and others";
while ($text =~ /
\s+ #A space one or more times
( #Start of $1
[^,]+ #Not a comma, one or more times
) #End of $1
, #A comma
/gxms #(g)lobal matching plus standard xms
) {
say $1;
}
--output:--
Iago
Othello
Given a string: Enter Iago, Othello, and others I want to extract "Iago" and "Othello" to a data structure.
No data structure :(. Reputation--. | [reply] [d/l] |
Re: Seeking a better way to do it
by frozenwithjoy (Priest) on Feb 01, 2013 at 00:20 UTC
|
What is your rule for extracting 'Iago' and 'Othello'? Do you also mean to extract 'Enter' (another word that starts w/ a cap)?
You said your solution works, but you write plit instead of split and I get other errors. Can you update it with functional code? | [reply] [d/l] [select] |
|
I think the OP meant 'Iago' and 'Othello', so maybe something
similar to this:
#!/usr/bin/perl -l
use strict;
use warnings;
my ($char_list) = "Enter Iago, Othello, and others";
my (@words) = split( /\W/, $char_list, 0 );
foreach my $word (@words) {
if ( $word =~ m/[F-Z]\w+/g ) {
push my (@entering_chars), $word;
print "@entering_chars";
}
}
| [reply] [d/l] |
|
| [reply] |
|
Ya, we really need more info. Limiting the first letter to F through Z is not very portable. If you already know the words you want (and choose the letters accordingly), you might as well name the words specifically.
| [reply] |
|
Re: Seeking a better way to do it
by starX (Chaplain) on Feb 01, 2013 at 06:08 UTC
|
Thanks, everyone, that was what I was looking for. | [reply] |
|
| [reply] |
|
|