in reply to Perl can do it, take 1 (sentence generation)

As you mention, Lisp does better with lists, for sure. But if there's one datatype where Perl outshines them all, it's with strings. So if you want to make it more Perlish, why not translate it from a list manipulation problem into a string manipulation problem? Let the great regex support do the heavy lifting:
my %dict = ( SENTENCE => ["NP VP"], NP => ["ART NOUN"], VP => ["VERB NP"], ART => [qw/ the a /], NOUN => [qw/ man ball woman table /], VERB => [qw/ hit took saw liked /] ); sub rand_production { my $items = $dict{+shift}; return $items->[rand @$items]; } sub generate { local $_ = shift; my $nonterminal = join "|", map quotemeta, keys %dict; 1 while s/($nonterminal)/ rand_production($1) /e; return $_; } print generate("SENTENCE"), $/;


Replies are listed 'Best First'.
Re^2: Perl can do it, take 1 (sentence generation)
by spurperl (Priest) on Jun 17, 2005 at 15:23 UTC
    Nice++, this is indeed a more Perlish way, certainly very different from the Lispish one.

    A few observations & questions:

    1. I dread +shift (and anything with a + to force scalar context, it just looks so kludgy...), I guess you could use $_[0] instead ?
    2. Why are you using quotemeta ?
      I dread +shift (and anything with a + to force scalar context, it just looks so kludgy...), I guess you could use $_[0] instead ?
      In this context, a lot of people prefer using "shift()" or the even more explicit "shift @_" instead.
      Why are you using quotemeta ?
      I consider it similar to the following Good Habits of 3-arg open and the list form of system:
      open my $fh, "<", "hard-coded-filename"; # instead of: open my $fh "<hard-coded-filename" system qw[somecommand -o -p -q]; # hard-coded arguments # instead of: system "somecommand -o -p -q"
      Even though these pairs of code are equivalent, the first of each pair is just a better habit.

      In my previous post, I'm constructing a string that will be interpreted as a regular expression, and I want to reinforce the fact that the keys of %dict should be matched as exact strings. If one of the keys had a period or a curly brace, or square brackets, these need to be escaped. So even though I know that all the keys that I put in %dict are safe to begin with, I do quotemeta anyway for extra B&D good karma.

      And to be extra safe, to construct a list of alternations, one should usually sort by decreasing length as well:

      join "|", map quotemeta, sort { length $b <=> length $a } keys %dict;
      Since the regex engine tries alternations from left to right, if any of the keys were a substring of another key, we would need to list the longer key first. Otherwise we would never match the longer key...

      Or in this case, since the keys are all \w characters, we could put \b's around ($nonterminal) to force it to match a maximal \w+ word.


        Re shift(), why do most people prefer it ? I mean, from a pure readability-based view... Isn't $dict{$_[0]} clearer to understand as "the values stored in %dict with the first argument as the key" than $dict{+shift}

        When you read those two statements, doesn't the first one's meaning spring into your head more naturally ?

        All in all, Perl's @_ and its $_N elements are burdensome in some cases, and don't look good whatever you do. I guess that Perl 6's argument declarations will solve many of these problems.

        Re quotemeta, thanks, I understand it's a good habit. You make as little assumtions as possible about the input, which is nice.

      I dread +shift (and anything with a + to force scalar context, it just looks so kludgy...), I guess you could use $_0 instead ?

      Yes. Or add my ($var) = @_ then use $var.

      Why are you using quotemeta ?

      In case there are any special characters in the words being joined. If there was a "*" in the word, for example, it would need to be escaped because we're forming a regexp. That's exactly what quotemeta does.

      Just a nit, the unary + doesn't "force scalar context", it actually just causes perl to interpret the next thing as an expression, rather than say, a bareword or a block.

      (Nitpicking myself, what is a better word than "thing" when I say "interpret the next thing". Terminal? Token?)