Frankly, I'm embarrased, because I'm BFI'ing it, instead of doing things properly.
But here goes. Against my better judgement.
#
# WARNING WARNING WARNING WARNING
#
# USE AT YOUR OWN RISK.
#
# THIS IS A MASSIVE KLUDGE.
#
# YOU HAVE BEEN WARNED.
#
my $in = <DATA>;
# ASSUME sentences end in a period and a space.
my @sentences = split '\. ', $in;
foreach( @sentences )
{
# ASSUME these words are mostly useless
# for our purposes...
s/\b(with|a|of|the|in|just)\b//gi;
# ASSUME phrases are comma-separated.
my @phrases = split ',';
my @subjects = ();
my @descs = ();
foreach ( @phrases )
{
s/^\s*//; # trim leading spaces.
s/\n//g; # remove newline.
# Well, do we have a subject, or a descriptor?
# ASSUME subjects are capitalized (!!)
push @subjects, $_ if /^[A-Z]/;
# ASSUME descriptions are not.
push @descs, $_ unless /^[A-Z]/;
}
# Print 'em all out.
foreach my $subj ( @subjects )
{
my @subsub = ($subj);
# ASSUME 'and' separates multiple subjects (!!)
@subsub = split ' and ', $subj if $subj =~ /\band\b/;
foreach my $ss (@subsub)
{
print "$ss: $_\n" foreach @descs;
}
}
}
__DATA__
With a population of more than 10.2 million, Seoul, the capital of Sou
+th Korea, is the world's largest city in terms of population. Sao Pau
+lo(Brazil), the world's second-largest city, has a population of just
+ over ten million. Three other cities, Bombay(India), Jakarta(Indones
+ia) and Karachi(Pakistan), have grown to more than nine million peopl
+e.
The output:
Seoul: population more than 10.2 million
Seoul: capital South Korea
Seoul: is world's largest city terms population
Sao Paulo(Brazil): world's second-largest city
Sao Paulo(Brazil): has population over ten million
Three other cities: have grown to more than nine million people.
Bombay(India): have grown to more than nine million people.
Jakarta(Indonesia): have grown to more than nine million people.
Karachi(Pakistan): have grown to more than nine million people.