Frankly, I'm embarrased, because I'm BFI'ing it, instead of doing things properly.
But here goes. Against my better judgement.
#
# WARNING WARNING WARNING WARNING
#
# USE AT YOUR OWN RISK.
#
# THIS IS A MASSIVE KLUDGE.
#
# YOU HAVE BEEN WARNED.
#
my $in = <DATA>;
# ASSUME sentences end in a period and a space.
my @sentences = split '\. ', $in;
foreach( @sentences )
{
# ASSUME these words are mostly useless
# for our purposes...
s/\b(with|a|of|the|in|just)\b//gi;
# ASSUME phrases are comma-separated.
my @phrases = split ',';
my @subjects = ();
my @descs = ();
foreach ( @phrases )
{
s/^\s*//; # trim leading spaces.
s/\n//g; # remove newline.
# Well, do we have a subject, or a descriptor?
# ASSUME subjects are capitalized (!!)
push @subjects, $_ if /^[A-Z]/;
# ASSUME descriptions are not.
push @descs, $_ unless /^[A-Z]/;
}
# Print 'em all out.
foreach my $subj ( @subjects )
{
my @subsub = ($subj);
# ASSUME 'and' separates multiple subjects (!!)
@subsub = split ' and ', $subj if $subj =~ /\band\b/;
foreach my $ss (@subsub)
{
print "$ss: $_\n" foreach @descs;
}
}
}
__DATA__
With a population of more than 10.2 million, Seoul, the capital of Sou
+th Korea, is the world's largest city in terms of population. Sao Pau
+lo(Brazil), the world's second-largest city, has a population of just
+ over ten million. Three other cities, Bombay(India), Jakarta(Indones
+ia) and Karachi(Pakistan), have grown to more than nine million peopl
+e.
The output:
Seoul: population more than 10.2 million
Seoul: capital South Korea
Seoul: is world's largest city terms population
Sao Paulo(Brazil): world's second-largest city
Sao Paulo(Brazil): has population over ten million
Three other cities: have grown to more than nine million people.
Bombay(India): have grown to more than nine million people.
Jakarta(Indonesia): have grown to more than nine million people.
Karachi(Pakistan): have grown to more than nine million people.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.