Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

making a markovian "mad lib"

by Aldebaran (Hermit)
on Mar 20, 2019 at 22:32 UTC ( #1231512=perlquestion: print w/replies, xml ) Need Help??
Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I'm trying something new that I've been bouncing around in my head since bliako posted his markovian meditation n-dimensional statistical analysis of DNA sequences (or text, or ...). bliako has all these interesting and abstruse scripts that are right on the edge of what I can replicate. I was able to do so once with the Markovian Frankentext of that thread but could not do it again. I want to reach for something much simpler, indeed as simple as I can imagine it as an SSCCE. With that as an introduction, I'll put a statement of the problem in readmore tags:

In 1972, we could order things called "Mad Libs" which were text templates for writing stories. I would like to create one with stochastic inputs. The templates are read as files using Path::Tiny after which they undergo the substitutions using Text::Template. The lexicographic order provides a basic cause/effect continuum that makes sense. I have parts of this written, hopefully well enough so that I describe what I seek.

I worked up an example on github. It is 2.8k now and I will say that it is less than 50 k and hope never to have to update this line. These programs can generate a lot of story pretty quickly, so I'll do my part to keep bandwidth to modem levels. I present output:

$ pwd /home/bob/Downloads/Markov-master $ cx 4.markov.pl #alias for chmod +x $ ./4.markov.pl markov/ sub_dir is markov/ path1 is /home/bob/Downloads/Markov-master path2 is /home/bob/Downloads/Markov-master/markov out_file is /home/bob/Downloads/Markov-master/markov/my_data/20-03-201 +9-14-27-36.txt.txt -------system out--------- I, al debaran, have endured many adventures in my life, but I really t +hought it was all prettymuch random. That the universe were determini +stic never really occured to me. I love to throw the toy with the dog! When I throw with my left hand, +I try to follow through with extension. Sometimes I repeat with the u +ndef hand. You never know when the toy will take a big bounce, so I k +eep it low. It went south to the fence. Then my pretty pitty retrieve +d it at a gallop. .7 of the time I go to the gym. I perform stretching for my health. ---------------- [ ["protaganist", "al debaran", "the narrator", "JAPH", "gilligan"], ["trials", "adventures", "Bedraengnisse"], ["ball", "toy", "object"], ["orientation", "left", "right"], ["dog", "my pretty pitty"], ["num1", ".7", ".3", "50 percent"], ["activity", "stretching", "swimming"], ["non_orientation", "undef"], ["direction", "south to the fence", "west", "yonder"], ] $

then source:

#!/usr/bin/perl -w use 5.011; use Path::Tiny; use utf8; use open OUT => ':utf8'; use Data::Dump; use Text::Template; use POSIX qw(strftime); binmode STDOUT, 'utf8'; my ($sub_dir) = $ARGV[0]; say "sub_dir is $sub_dir"; my $path1 = Path::Tiny->cwd; say "path1 is $path1"; my $path2 = path( $path1, $sub_dir ); say "path2 is $path2"; # create an output file my $munge = strftime( "%d-%m-%Y-%H-%M-%S\.txt", localtime ); my $out_file = $path2->child( 'my_data', "$munge" )->touchpath; say "out_file is $out_file"; ## populate hash my %vars = ( protaganist => 'al debaran', trials => 'adventures', ball => 'toy', orientation => 'left', dog => 'my pretty pitty', num1 => '.7', activity => 'stretching', non_orientation => 'undef', direction => 'south to the fence', ); my $rvars = \%vars; my @pfaden = $path2->children(qr/\.txt$/); @pfaden = sort @pfaden; #say "paths are @pfaden "; for my $file (@pfaden) { #say "default is $file"; my $template = Text::Template->new( ENCODING => 'utf8', SOURCE => $file, ) or die "Couldn't construct template: $!"; my $result = $template->fill_in( HASH => $rvars ); $out_file->append_utf8($result); } say "-------system out---------"; system("cat $out_file"); say "----------------"; my $data = [ [ 'protaganist', 'al debaran', 'the narrator', 'JAPH', ' +gilligan'], [ 'trials' ,'adventures', 'Bedraengnisse'], [ 'ball','toy', 'object'], [ 'orientation' , 'left', 'right'], [ 'dog' ,'my pretty pitty'], [ 'num1', '.7', '.3', '50 percent'], [ 'activity', 'stretching', 'swimming'], [ 'non_orientation', 'undef'], [ 'direction', 'south to the fence', 'west', 'yonder' ] ]; dd $data; __END__

So, I'm getting good intermediary results. What I want to do is map the values from $data to populate %var instead. The first column is the keys of the hash. Those that follow are probable values. For starters, I would like to make them equaprobable, so a mapping function would get a pseudorandom on the unit interval, and multiply it by the cardinality of the possibilities. I haven't done that in perl, so I'm fishing for ways to do that.

Also, I'm looking for a little logic that returns the compliment of left and right in 'orientation'. I've got ways to do these things, they just all look like leftover fortran and C. I'm looking for perl solutions. Maybe I need to represent these data entirely differently....

Thanks for your comment

Replies are listed 'Best First'.
Re: making a markovian "mad lib"
by trippledubs (Chaplain) on Mar 21, 2019 at 02:38 UTC

    I'll have a go

    for (@{$data}) { my $key = $_->[0]; my $last_element = $#{$_}; my $random = $_->[rand($#{$_}) + 1]; $vars{$key} = $random; }
    A more concise way:
    %vars = map { $_->[0],$_->[rand($#{$_}) + 1] } @{$data};
    You could represent like:
    $story = { protaganist => [ 'al debaran','narrator','japh','gilligan' ], dog => [ 'my pretty pitty' ], };
    or
    data/protaganist.txt al debaran the narrator japh data/trials.txt adventures bedraengisse
      I'll have a go

      Your response seems to work, thank you very much. Indeed, it helped me to understand the syntax involved. I used the more compact version and got the behaviours I was looking for. I'll put output and source between readmore tags.

      So I'm happy with this as an intermediate result but started puzzling on what I was coding towards. I wanted to throw the ball to the dog and have that repeat through probabilistic methods. The repeated events were hard to extricate causally. If I tried to represent the scenario with object oriented methods, what would the "objects" be? The story, the things in the story, the state of the things in the story, the Animals as with _Intermediate Perl_? Wherein does it show Markovian nature?

      After wrestling with that for a few days whilst distracted by the US div I college men's basketball tournament, I decided, wait a sec, this tournament with its "bracketology" is the type of creature I'm looking for. So I'm shifting the story from being about the dog chasing the ball to ten guys on a court chasing a ball like their lives depend on it. I'm redefining the problem here a bit and would like to change the subject to reflect that. It doesn't take long for code to get longer, so I will use readmore tags:

      Also, I'm fishing for any way to re-imagine this problem. I'm gonna have to create a dozen hodge-podge arrays, and the tournament is always a 2**6 thing.*

      Thanks for your comment,

      *except for "first four" games

        The "markovian principle" is nothing more than the following simple tenet:

        next state depends (= is influenced) only on current state

        The context is a random process which outputs symbols (or it being in a "state"), one after another. For example, the process is "weather" during a single hour (i.e. 1 hour=1 weather state), with 3 states: "rain", "sunny", "dry-cloudy". And the outcome of this process is something like "rain"->"rain"->"sunny"->"dry-cloudy" ...

        If that was a "markovian process" then we could describe it simply by a "transition matrix" which is a convenient way to convey the information of what the probability of "next" state is given "current" state in the form of a 2d array often called a stochastic matrix:

        rain sunny dry-cloudy rain 0.6 0.2 0.2 sunny 0.3 0.5 0.2 dry-cloudy 0.4 0.3 0.3

        In the above array, row represents current state and column the next state, e.g. the probability of rain when now is raining is 0.6, the prob of dry-cloudy when now is sunny is 0.2 etc. The important property of the above matrix is that the probabilities of all the possible events from a current state must sum to 1: all rows sum to 1. Why? If we are now in current state of "rain" we have 3 possible outcomes. And so their probabilities must sum to one because one of them will happen with absolute certainty (as far as our model goes).

        Similarly, one can use multi-dimensional arrays in order to model a random process whose next state depends on n-previous states. It will not be "markovian" but we can use the same tools.

        So, the most important thing so far, forgetting about markov property etc, is that a random process outputs from a finite set of symbols with a probability depending on n-previous symbols and that can be modeled using a transition matrix as described above. In this matrix all the probabilities of the possible events from a current state must sum to 1.

        Another useful tool (equivalent to the transition matrix) in modelling or visualising such random processes (of finite number of events) is a Graph, see the diagram here: Markov_chain.

        Graph or transition matrix, once you built one by observing the weather for too long and finally estimating the transition probabilities, you can use it to simulate your random process. And make it produce random symbols.

        The Graph or matrix can also be constructed by hand from imagination. Feed that information to your simulator in order to run the random process. That's probably how a computer game would calculate the weather in Mars.

        I can not help you with your basketball model because I am not acquainted at all with these "big dances", "east", "west" etc. but perhaps you can rubber-duck it and in more general terms.

        bw, bliako

        So I'm happy with this as an intermediate result but started puzzling on what I was coding towards. I wanted to throw the ball to the dog and have that repeat through probabilistic methods. The repeated events were hard to extricate causally. If I tried to represent the scenario with object oriented methods, what would the "objects" be? The story, the things in the story, the state of the things in the story, the Animals as with _Intermediate Perl_? Wherein does it show Markovian nature?

        I didn't see any probability calculations in your script that I looked at. You asked for equiprobable. Bliako uses a corpus to build up probability tables, you used a template with some variables. So, from my very limited, but now less so by reading through n-dimensional statistical analysis of DNA sequences (or text, or ...), you need some input to build up those probability tables. If I understand correctly: for some input, some text probably follows some other text. Then for some other (or same) input, you replace the text based on those probabilities. In sports, some team beats some other team with some frequency.

        So if you want to utilize Bliakos methods you need a corpus that relates. Like, previous scores. You are using rankings (seeds), but that will just give you what you already have. It's not going to increase the accuracy if ranked 1 probably beats ranked 2. You need a corpus with scores or something. Then you might be able to bet the spread.

        Bliako's solution aims to be more general in that, the n-gram is configurable, as well as what separates the n-grams. You have to have an input that matches that criteria or hammer it into that. Team X vs Team Y will not be read the same as Team Y vs Team X. And it's not going to account for degrees, only the probability of a sequence. Degrees like, if team X beats Y by a whole lot. Or what is the average point difference, things like that. So if you had a corpus like

        Virginia: 40 Duke 60 (Duke) Virginia: 50 Duke 70 (Duke) Virginia: 80 Duke 70 (Virginia)

        With n-gram=2 and separator being 'space and then some digits' it follows that (Duke) succeeds Virginia: Duke 2/3 of the time. Those are the kind of probabilities Bliako is using, pretty sure.

Re: making a markovian "mad lib"
by Your Mother (Bishop) on Mar 20, 2019 at 23:18 UTC

      Do you have the script that generated that?

        I looked and I found the very sparse + heavily commented out remains of what it probably was. I was also playing with String::Markov. The code I have left doesn't produce anything close to what I posted. So… :P I do enjoy this stuff and I think what you're doing is awesome. I wish I had more time to play with Perl for fun. I did actually do a news reader just wrapping the OS X command line tool say a couple days ago. I might clean that up to post.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1231512]
Approved by stevieb
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2019-04-20 08:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I am most likely to install a new module from CPAN if:
















    Results (108 votes). Check out past polls.

    Notices?
    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!