Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Perl can do it, take 1 (sentence generation)

by spurperl (Priest)
on Jun 17, 2005 at 14:54 UTC ( #467744=perlmeditation: print w/ replies, xml ) Need Help??

Fellow monks,

As I mentioned here, I want to practice taking good, elegant Lisp code and re-writing it in Perl. And the first opportunity for this came today.

While browsing Peter Norvig's website, I came upon this - a primer in Python for Lisp programmers. In the end, in a section called "Comparing Lisp and Python Programs", Norvig takes an excerpt from his Paradigms of Artificial Programming (one of the best programming book in the world, IMHO) - generation of random sentences from defined grammars, and implements a Python version for it.

So, I decided to employ Perl to compete in this task as well...

Here is an implementation I came up with (only generate, as generate_tree is the same idea):

use warnings; use strict; my %dict = ( sentence => [[qw/ noun_phrase verb_phrase /]], noun_phrase => [[qw/ Article Noun /]], verb_phrase => [[qw/ Verb noun_phrase /]], Article => [qw/ the a /], Noun => [qw/ man ball woman table/], Verb => [qw/ hit took saw liked /] ); sub rand_elt { return $_[rand scalar @_]; } sub listp { return ref($_[0]) eq "ARRAY"; } sub generate { my $phrase = shift; if (listp $phrase) { return [map {@{generate($_)}} @{$phrase}]; } elsif (exists $dict{$phrase}) { return generate(rand_elt(@{$dict{$phrase}})); } else { return [$phrase]; } } print join(' ', @{generate("sentence")}), "\n";

Overall I'm satisfied with the result - it's an elegant functional code that represents the solution in an intuitive recursive way, without too much overhead.

Some observations:

  1. Note that Lisp also doesn't have the random-elt function built-in, it's defined by Norvig elsewhere.
  2. In the attempt to "transliterate" the Lisp code, Perl succeeds nicely. Interestingly, the elegant mappend construct that's defined in the Lisp/Python versions is implemented trivially with Perl's native map.
  3. Perl also allows to avoid defining the rewrites the Lisp version defines, since it has easier hash access.
  4. In Lisp, all objects are pointers and the data they point to has a type. Thus, you can ask (listp foo). In Perl it doesn't work - arrays/lists and scalars are very different. Hence, in my implementation a "list" is an array ref, it allows to define $phrase as a scalar. The listp function I defined helps spot the difference. In general, Lisp's handling of lists is much cleaner, since Perl requires @{$..} sugar.

Generally, this code is "the Lisp way", or rather "the functional programming way". There may be more "perly" solution that are more succinct (I don't mean golf !), efficient and so on. You are welcome to propose other methods and / or observations.

P.S: implementations in Perl 6 will be also most welcome !

Comment on Perl can do it, take 1 (sentence generation)
Download Code
Re: Perl can do it, take 1 (sentence generation)
by blokhead (Monsignor) on Jun 17, 2005 at 15:14 UTC
    As you mention, Lisp does better with lists, for sure. But if there's one datatype where Perl outshines them all, it's with strings. So if you want to make it more Perlish, why not translate it from a list manipulation problem into a string manipulation problem? Let the great regex support do the heavy lifting:
    my %dict = ( SENTENCE => ["NP VP"], NP => ["ART NOUN"], VP => ["VERB NP"], ART => [qw/ the a /], NOUN => [qw/ man ball woman table /], VERB => [qw/ hit took saw liked /] ); sub rand_production { my $items = $dict{+shift}; return $items->[rand @$items]; } sub generate { local $_ = shift; my $nonterminal = join "|", map quotemeta, keys %dict; 1 while s/($nonterminal)/ rand_production($1) /e; return $_; } print generate("SENTENCE"), $/;

    blokhead

      Nice++, this is indeed a more Perlish way, certainly very different from the Lispish one.

      A few observations & questions:

      1. I dread +shift (and anything with a + to force scalar context, it just looks so kludgy...), I guess you could use $_[0] instead ?
      2. Why are you using quotemeta ?
        I dread +shift (and anything with a + to force scalar context, it just looks so kludgy...), I guess you could use $_0 instead ?

        Yes. Or add my ($var) = @_ then use $var.

        Why are you using quotemeta ?

        In case there are any special characters in the words being joined. If there was a "*" in the word, for example, it would need to be escaped because we're forming a regexp. That's exactly what quotemeta does.

        I dread +shift (and anything with a + to force scalar context, it just looks so kludgy...), I guess you could use $_[0] instead ?
        In this context, a lot of people prefer using "shift()" or the even more explicit "shift @_" instead.
        Why are you using quotemeta ?
        I consider it similar to the following Good Habits of 3-arg open and the list form of system:
        open my $fh, "<", "hard-coded-filename"; # instead of: open my $fh "<hard-coded-filename" system qw[somecommand -o -p -q]; # hard-coded arguments # instead of: system "somecommand -o -p -q"
        Even though these pairs of code are equivalent, the first of each pair is just a better habit.

        In my previous post, I'm constructing a string that will be interpreted as a regular expression, and I want to reinforce the fact that the keys of %dict should be matched as exact strings. If one of the keys had a period or a curly brace, or square brackets, these need to be escaped. So even though I know that all the keys that I put in %dict are safe to begin with, I do quotemeta anyway for extra B&D good karma.

        And to be extra safe, to construct a list of alternations, one should usually sort by decreasing length as well:

        join "|", map quotemeta, sort { length $b <=> length $a } keys %dict;
        Since the regex engine tries alternations from left to right, if any of the keys were a substring of another key, we would need to list the longer key first. Otherwise we would never match the longer key...

        Or in this case, since the keys are all \w characters, we could put \b's around ($nonterminal) to force it to match a maximal \w+ word.

        blokhead

        Just a nit, the unary + doesn't "force scalar context", it actually just causes perl to interpret the next thing as an expression, rather than say, a bareword or a block.

        (Nitpicking myself, what is a better word than "thing" when I say "interpret the next thing". Terminal? Token?)
Re: Perl can do it, take 1 (sentence generation)
by BrowserUk (Pope) on Jun 17, 2005 at 16:25 UTC

    You can do away with a lot of the punctuation by putting the dereference inside rand_elt() which avoids having it most everywhere else. By predeclaring generate(), you can avoid the need for parens on most of the function calls which cleans it up further.

    The final thing of using nested ternaries rather than if / then / else is probably a step too far and will be seen as golfing, but I think it actualy comes closer to the Lisp cond operator (function?).

    #! perl -slw use strict; my %dict = ( sentence => [ [ qw/ noun_phrase verb_phrase / ] ], noun_phrase => [ [ qw/ Article Noun / ] ], verb_phrase => [ [ qw/ Verb noun_phrase / ] ], Article => [ qw/ the a /], Noun => [ qw/ man ball woman table/ ], Verb => [ qw/ hit took saw liked / ] ); sub rand_elt { return @{ $_[ 0 ] }[ rand @{ $_[ 0 ] } ]; } sub listp { return ref $_[0] eq "ARRAY"; } sub generate; sub generate { my $phrase = shift; return listp( $phrase ) ? map{ generate $_ } @{ $phrase } :exists $dict{ $phrase } ? generate rand_elt $dict{ $phrase } +: $phrase; } print join ' ', generate "sentence";

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

      For the small minority that might care, cond is specified as a macro: cond Hyperspec entry.

      I believe that it is legal for a Lisp implementation to implement a macro as a special form as long as an equivalent macro version is also provided, but don't quote me on that.

      But is the ternary-using version really cleaner / clearer to understand than the if/elsif/else chain ?

      Lisp's cond macro is very nice in small functions, but I've seen Lisp function reaching the size of a page (and many have a big cond as their main bulk), it's very hard to follow their logic. I'm starting to think that though if/elsif/else adds a lot of syntax, it's in fact simpler to understand.

      Switches are simpler to follow than conds, because they talk about a single variable, not arbitrary conditions.

        Like I said, probably a step too far, but I was trying to stay true to the Lisp original. If the intent is to completely Perlify the algorithm, then you probably wouldn't use the listp() abstraction (and it certainly wouldn't be called that:).

        If Perl's if then elsif else construct worked like those in Haskell or SQL, you could do

        return if <cond> then <value> elsif <cond2> then <val2> else <val3>;

        The nearest I could get is the nested ternaries.

        I also tried to code the cond macro, and got pretty close. It follows all the right paths, but implementing rewrite is much harder, and the values don't filter up without that.

        #! perl -slw use strict; my %dict = ( sentence => [ [ qw/ noun_phrase verb_phrase / ] ], noun_phrase => [ [ qw/ Article Noun / ] ], verb_phrase => [ [ qw/ Verb noun_phrase / ] ], Article => [ qw/ the a /], Noun => [ qw/ man ball woman table/ ], Verb => [ qw/ hit took saw liked / ] ); sub rand_elt { return @{ $_[ 0 ] }[ rand @{ $_[ 0 ] } ]; } sub listp { return ref $_[0] eq "ARRAY"; } sub cond; sub cond { return shift->() || cond @_; } sub generate; sub generate { my $phrase = shift; warn $phrase.$/; return cond sub{ warn 'A'.$/; return listp( $phrase ) ? map{ generate $_ } @{ $phrase } : () }, sub{ warn 'B'.$/; return exists $dict{ $phrase } ? generate rand_elt $dict{ $phrase } : () }, sub{ warn 'C'.$/; return 1 ? $phrase : () }; }

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re: Perl can do it, take 1 (sentence generation)
by eric256 (Parson) on Jun 19, 2005 at 16:22 UTC

    Here is my perl6 port of your code. I had to work around a thing with given/when...unless they fixed it and i didn't notice yet.

    use v6; my %dict = ( sentence => [[qw/ noun_phrase verb_phrase /]], noun_phrase => [[qw/ Article Noun /]], verb_phrase => [[qw/ Verb noun_phrase /]], Article => [qw/ the a /], Noun => [qw/ man ball woman table/], Verb => [qw/ hit took saw liked /] ); sub generate ($phrase) { given $phrase { when .does(List) { return $phrase.map:{generate($_)} } if (defined %dict.{$_}) { return generate( %dict.{$_}.pick) } else { return [$phrase] } } } sub gen_sentence () { generate("sentence").join(" ").say }; gen_sentence for 1..5;

    ___________
    Eric Hodges
      Can you please explain what .does(List) means ? I understand intuitively that it asks whether $phrase is a list, but how does it work, exactly ?

        As far as I know that is all there is to tell. ;) The . by itself acts on the topic ($_) So that is the same as $_.does(List). My understanding is that does is checking the roles of the object it is called on. Evnetualy (when pugs gets more mature) we will be able to do. when List {} and it will DWIM, hopefully. BTW the given $phrase makes $phrase the current topic.


        ___________
        Eric Hodges
      Here is my version of the generate sub. Won't probably run on Pugs right now.
      use v6; sub generate { when List { .map:{ generate $_ } } when %dict { generate %dict{$_}.pick } default { [ $phrase ] } }
      For some reason I'm assuming that $_ is aliased to the first argument of a sub when there is no signature specified. If that's wrong, please tell me!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://467744]
Approved by Steve_p
Front-paged by Steve_p
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (10)
As of 2014-10-30 13:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (208 votes), past polls