Most of my work involves interface-building, and I've found that once the logic and presentation are reasonably together, the biggest factor in the success or failure of the system is the tone and clarity with which it addresses the user.

So i've become obsessed with writing ui scripts that use proper colloquial english. People react so much better to a page that tells them what's going in a conversational way that i almost don't mind bloating the scripts with sentence construction code and obfuscating the templates with grammatical conditionals.

Here's a very simple example, dug out of the middle of a script i'm updating at the moment. The end result is a sentence in the form:

We found 27 campaign updates and case studies relevant to children and young people, death penalty and the Americas.

except sometimes it's only one type of document, or no restriction at all, or only one keyword, only two results, and so on. There are dozens of permutations, and the proper way of describing the situation is different each time in small but vital ways. This excerpt is as close as I've come without spelling everything out:

my @records = qw(1 2 3 4 5 6); @input::id = qw(id1 id2 id3 id4); @input::type = qw(document person); print summarise(\@input::id, \@input::type, scalar(@records)); exit; sub summarise { my ($ids,$types,$matches) = @_; my $sentence = 'We found '; $sentence .= $matches || 'no'; if (@$types) { foreach my $i (0..$#$types) { if ($i && $i == $#$types) { $sentence .= ($matches > 1) ? ' and ' : ' or '; } elsif ($i) { $sentence .= ', '; } # document types are in the database # with singular and plural forms of their title # but i've skipped that part here $sentence .= qq| <a href="link">$$types[$i]</a>|; } } else { $sentence .= "item"; $sentence .= "s" if ($matches > 1); } $sentence .= ' relevant to '; $sentence .= 'both ' if (@$ids == 2); $sentence .= 'all of ' if (@$ids > 2); foreach my $i (0..$#$ids) { if ($i && $i == $#$ids) { $sentence .= ' and '; } elsif ($i) { $sentence .= ', '; } # keyword titles also looked up from the database really. $sentence .= qq|<a href="link">$$ids[$i]</a>|; } return $sentence; }

If anyone is interested enough to make this more elegant - or just play golf with it - i'd be much obliged.

but my main question: is there a module or project that'll do some of this work for me? CPAN yields a lot of stemming and other mechanisms designed to make words more friendly to computers, but not much designed to make them more friendly to people.

If there isn't any such module, i'd like to start building one. I imagine something extensibly rule-based with a relatively small number of abstract construction mechanisms for common sentence forms, and a vocabulary of prepositions and articles and so on. Ideally swappable into languages other than English, one day. Any views about feasibility or functionality?


In reply to natural language sentence construction by thpfft

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":