Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

bracketology was Re^2: making a markovian "mad lib"

by Aldebaran (Deacon)
on Mar 22, 2019 at 23:57 UTC ( #1231580=note: print w/replies, xml ) Need Help??


in reply to Re: making a markovian "mad lib"
in thread making a markovian "mad lib"

I'll have a go

Your response seems to work, thank you very much. Indeed, it helped me to understand the syntax involved. I used the more compact version and got the behaviours I was looking for. I'll put output and source between readmore tags.

Abridged output:

$ ./6.markov.pl markov sub_dir is markov path1 is /home/bob/Documents/meditations path2 is /home/bob/Documents/meditations/markov out_file is /home/bob/Documents/meditations/markov/my_data/22-03-2019- +13-37-35/22-03-2019-13-37-35.1.txt -------system out--------- I, the narrator, have endured many adventures in my life, but I really + thought it was all prettymuch random. That the universe were determi +nistic never really occured to me. I love to throw the toy with the dog! When I throw with my left hand, +I try to follow through with extension. Sometimes I repeat with the u +ndef hand. You never know when the toy will take a big bounce, so I k +eep it low. It went south to the fence. Then my pretty pitty retrieve +d it at a gallop. .3 of the time I go to the gym. I perform swimming for my health. ---------------- out_file is /home/bob/Documents/meditations/markov/my_data/22-03-2019- +13-37-35/22-03-2019-13-37-35.2.txt -------system out--------- I, JAPH, have endured many Bedraengnisse in my life, but I really thou +ght it was all prettymuch random. That the universe were deterministi +c never really occured to me. I love to throw the toy with the dog! When I throw with my right hand, + I try to follow through with extension. Sometimes I repeat with the +undef hand. You never know when the toy will take a big bounce, so I +keep it low. It went west. Then my pretty pitty retrieved it at a gal +lop. 50 percent of the time I go to the gym. I perform swimming for my heal +th.

...this plus 28 similar stories. Source:

#!/usr/bin/perl -w use 5.011; use Path::Tiny; use utf8; use open OUT => ':utf8'; use Data::Dump; use Text::Template; use POSIX qw(strftime); binmode STDOUT, 'utf8'; my ($sub_dir) = $ARGV[0]; say "sub_dir is $sub_dir"; my $path1 = Path::Tiny->cwd; say "path1 is $path1"; my $path2 = path( $path1, $sub_dir ); say "path2 is $path2"; ## populate hash my $data = [ [ 'protaganist', 'al debaran', 'the narrator', 'JAPH', ' +gilligan'], [ 'trials' ,'adventures', 'Bedraengnisse'], [ 'ball','toy', 'object'], [ 'orientation' , 'left', 'right'], [ 'dog' ,'my pretty pitty'], [ 'num1', '.7', '.3', '50 percent'], [ 'activity', 'stretching', 'swimming'], [ 'non_orientation', 'undef'], [ 'direction', 'south to the fence', 'west', 'yonder' ] ]; #dd $data; ## main loop # set trials my $trials = 30; my $dummy = 1; while ($trials > 0){ # create an output file my $first_second = strftime( "%d-%m-%Y-%H-%M-%S", localtime ); my $out_file = $path2->child( 'my_data', "$first_second", "$first_seco +nd\.$dummy.txt")->touchpath; say "out_file is $out_file"; my %vars = map { $_->[0],$_->[rand($#{$_}) + 1] } @{$data}; my $rvars = \%vars; my @pfaden = $path2->children(qr/\.txt$/); @pfaden = sort @pfaden; #say "paths are @pfaden "; for my $file (@pfaden) { #say "default is $file"; my $template = Text::Template->new( ENCODING => 'utf8', SOURCE => $file, ) or die "Couldn't construct template: $!"; my $result = $template->fill_in( HASH => $rvars ); $out_file->append_utf8($result); } say "-------system out---------"; system("cat $out_file"); say "----------------"; $trials -= 1; $dummy += 1; } # end while condition __END__

So I'm happy with this as an intermediate result but started puzzling on what I was coding towards. I wanted to throw the ball to the dog and have that repeat through probabilistic methods. The repeated events were hard to extricate causally. If I tried to represent the scenario with object oriented methods, what would the "objects" be? The story, the things in the story, the state of the things in the story, the Animals as with _Intermediate Perl_? Wherein does it show Markovian nature?

After wrestling with that for a few days whilst distracted by the US div I college men's basketball tournament, I decided, wait a sec, this tournament with its "bracketology" is the type of creature I'm looking for. So I'm shifting the story from being about the dog chasing the ball to ten guys on a court chasing a ball like their lives depend on it. I'm redefining the problem here a bit and would like to change the subject to reflect that. It doesn't take long for code to get longer, so I will use readmore tags:

I intend to form the probabilities of whether a team will win from their ranking and so have munged it into their name. So far, I only have 1/4 of the teams, but I still lack the ability to simulate game play. A printable version of what I'm imitating is here First output:

$ ./3.mm.pl hoops sub_dir is hoops path1 is /home/bob/Documents/meditations path2 is /home/bob/Documents/meditations/hoops out_file is /home/bob/Documents/meditations/hoops/my_data/22-03-2019-1 +6-09-08/22-03-2019-16-09-08.1.txt pairs are 1.duke vs 16.ndST 8.vcu vs 9.ucf 5.msST vs 12.lib 4.vaTech v +s 13.stlouis 6.maryland vs 11.belmont 3.lsu vs 14.yale 7.louisville v +s 10.mn 2.miST vs 15.bradley pairs are 1.duke vs 16.ndST matched 1 duke 16 ndST { event => "the Big Dance", protaganist => "al debaran", ref_east => [ "1.duke", "16.ndST", "8.vcu", "9.ucf", "5.msST", "12.lib", "4.vaTech", "13.stlouis", "6.maryland", "11.belmont", "3.lsu", "14.yale", "7.louisville", "10.mn", "2.miST", "15.bradley", ], region => "south", } -------system out--------- It is the Big Dance again, and I, al debaran, wanted to make some pred +ictions. I pick to win in this round of the south. Their cardinality is . loo +ks particularly likely to win, while the underdog. The current state of the Big Dance is . ---------------- out_file is /home/bob/Documents/meditations/hoops/my_data/22-03-2019-1 +6-09-09/22-03-2019-16-09-09.2.txt pairs are 1.duke vs 16.ndST 8.vcu vs 9.ucf 5.msST vs 12.lib 4.vaTech v +s 13.stlouis 6.maryland vs 11.belmont 3.lsu vs 14.yale 7.louisville v +s 10.mn 2.miST vs 15.bradley pairs are 1.duke vs 16.ndST matched 1 duke 16 ndST { event => "hoops, baby", protaganist => "JAPH", ref_east => [ "1.duke", "16.ndST", "8.vcu", "9.ucf", "5.msST", "12.lib", "4.vaTech", "13.stlouis", "6.maryland", "11.belmont", "3.lsu", "14.yale", "7.louisville", "10.mn", "2.miST", "15.bradley", ], region => "south", } -------system out--------- It is hoops, baby again, and I, JAPH, wanted to make some predictions. + I pick to win in this round of the south. Their cardinality is . loo +ks particularly likely to win, while the underdog. The current state of hoops, baby is . ---------------- $

It looks like I pair them off properly, but I only get one team to match the ensuing regex. I kind of get lost there, which is not a good sign when you're writing your own code. Here is the current script; the entire workspace with scripts and data files is still on github. (Now 6.1 kb)

#!/usr/bin/perl -w use 5.011; use Path::Tiny; use utf8; use open OUT => ':utf8'; use Data::Dump; use Text::Template; use POSIX qw(strftime); binmode STDOUT, 'utf8'; my ($sub_dir) = $ARGV[0]; say "sub_dir is $sub_dir"; my $path1 = Path::Tiny->cwd; say "path1 is $path1"; my $path2 = path( $path1, $sub_dir ); say "path2 is $path2"; ## populate hash my $data = [ [ 'protaganist', 'al debaran', 'the narrator', 'JAPH', 'Dick Vitale' + ], [ 'event', 'March Madness', 'the Big Dance', 'hoops, baby' ], [ 'region', 'east', 'west', 'south', 'midwest' ] ]; #dd $data; ## main loop # set trials my $trials = 2; my $dummy = 1; while ( $trials > 0 ) { # create an output file my $first_second = strftime( "%d-%m-%Y-%H-%M-%S", localtime ); my $out_file = $path2->child( 'my_data', "$first_second", "$first_second\.$dummy. +txt" ) ->touchpath; say "out_file is $out_file"; # stochastic input of appositives my %vars = map { $_->[0], $_->[ rand( $#{$_} ) + 1 ] } @{$data}; my $rvars = \%vars; $rvars = pop_brackets($rvars); $rvars = calc_winners($rvars); dd $rvars; my @pfade = $path2->children(qr/\.txt$/); @pfade = sort @pfade; #say "paths are @pfade"; for my $file (@pfade) { #say "default is $file"; my $template = Text::Template->new( ENCODING => 'utf8', SOURCE => $file, ) or die "Couldn't construct template: $!"; my $result = $template->fill_in( HASH => $rvars ); $out_file->append_utf8($result); } say "-------system out---------"; system("cat $out_file"); say "----------------"; $trials -= 1; $dummy += 1; } # end while condition sub pop_brackets { my $rvars = shift; my %vars = %$rvars; my @east = qw(1.duke 16.ndST 8.vcu 9.ucf 5.msST 12.lib 4.vaTech 13.stlouis 6. +maryland 11.belmont 3.lsu 14.yale 7.louisville 10.mn 2.miST 15.bradley); $vars{ref_east} = \@east; return \%vars; } sub calc_winners { use 5.016; use warnings; my $rvars = shift; my %vars = %$rvars; my $new_ref = $vars{ref_east}; my @east = @$new_ref; #say "east is @east"; my @pairs; while (@east) { my $first = shift @east; my $next = shift @east; push @pairs, "$first vs $next"; } say "pairs are @pairs"; my @winners = play_game(@pairs); return \%vars; # end calc_winners } sub play_game { use 5.016; use warnings; my @pairs = shift; say "pairs are @pairs"; my @winners; for my $line (@pairs) { if ( $line =~ /^(\d+)\.(\w+) vs (\d+)\.(\w+)$/ ) { say "matched"; say "$1 $2 $3 $4"; my $denominator = $1 + $3; my $ratio = $3 / $denominator; my $random_number = rand(); if ( $random_number < $ratio ) { push @winners, "$1.$2"; } else { push @winners, "$3.$4"; } } } return @winners; } # end play_game __END__

Also, I'm fishing for any way to re-imagine this problem. I'm gonna have to create a dozen hodge-podge arrays, and the tournament is always a 2**6 thing.*

Thanks for your comment,

*except for "first four" games

Replies are listed 'Best First'.
Re: bracketology was Re^2: making a markovian "mad lib"
by bliako (Prior) on Mar 23, 2019 at 11:39 UTC

    The "markovian principle" is nothing more than the following simple tenet:

    next state depends (= is influenced) only on current state

    The context is a random process which outputs symbols (or it being in a "state"), one after another. For example, the process is "weather" during a single hour (i.e. 1 hour=1 weather state), with 3 states: "rain", "sunny", "dry-cloudy". And the outcome of this process is something like "rain"->"rain"->"sunny"->"dry-cloudy" ...

    If that was a "markovian process" then we could describe it simply by a "transition matrix" which is a convenient way to convey the information of what the probability of "next" state is given "current" state in the form of a 2d array often called a stochastic matrix:

    rain sunny dry-cloudy rain 0.6 0.2 0.2 sunny 0.3 0.5 0.2 dry-cloudy 0.4 0.3 0.3

    In the above array, row represents current state and column the next state, e.g. the probability of rain when now is raining is 0.6, the prob of dry-cloudy when now is sunny is 0.2 etc. The important property of the above matrix is that the probabilities of all the possible events from a current state must sum to 1: all rows sum to 1. Why? If we are now in current state of "rain" we have 3 possible outcomes. And so their probabilities must sum to one because one of them will happen with absolute certainty (as far as our model goes).

    Similarly, one can use multi-dimensional arrays in order to model a random process whose next state depends on n-previous states. It will not be "markovian" but we can use the same tools.

    So, the most important thing so far, forgetting about markov property etc, is that a random process outputs from a finite set of symbols with a probability depending on n-previous symbols and that can be modeled using a transition matrix as described above. In this matrix all the probabilities of the possible events from a current state must sum to 1.

    Another useful tool (equivalent to the transition matrix) in modelling or visualising such random processes (of finite number of events) is a Graph, see the diagram here: Markov_chain.

    Graph or transition matrix, once you built one by observing the weather for too long and finally estimating the transition probabilities, you can use it to simulate your random process. And make it produce random symbols.

    The Graph or matrix can also be constructed by hand from imagination. Feed that information to your simulator in order to run the random process. That's probably how a computer game would calculate the weather in Mars.

    I can not help you with your basketball model because I am not acquainted at all with these "big dances", "east", "west" etc. but perhaps you can rubber-duck it and in more general terms.

    bw, bliako

Re: bracketology was Re^2: making a markovian "mad lib"
by trippledubs (Deacon) on Mar 25, 2019 at 18:27 UTC

    So I'm happy with this as an intermediate result but started puzzling on what I was coding towards. I wanted to throw the ball to the dog and have that repeat through probabilistic methods. The repeated events were hard to extricate causally. If I tried to represent the scenario with object oriented methods, what would the "objects" be? The story, the things in the story, the state of the things in the story, the Animals as with _Intermediate Perl_? Wherein does it show Markovian nature?

    I didn't see any probability calculations in your script that I looked at. You asked for equiprobable. Bliako uses a corpus to build up probability tables, you used a template with some variables. So, from my very limited, but now less so by reading through n-dimensional statistical analysis of DNA sequences (or text, or ...), you need some input to build up those probability tables. If I understand correctly: for some input, some text probably follows some other text. Then for some other (or same) input, you replace the text based on those probabilities. In sports, some team beats some other team with some frequency.

    So if you want to utilize Bliakos methods you need a corpus that relates. Like, previous scores. You are using rankings (seeds), but that will just give you what you already have. It's not going to increase the accuracy if ranked 1 probably beats ranked 2. You need a corpus with scores or something. Then you might be able to bet the spread.

    Bliako's solution aims to be more general in that, the n-gram is configurable, as well as what separates the n-grams. You have to have an input that matches that criteria or hammer it into that. Team X vs Team Y will not be read the same as Team Y vs Team X. And it's not going to account for degrees, only the probability of a sequence. Degrees like, if team X beats Y by a whole lot. Or what is the average point difference, things like that. So if you had a corpus like

    Virginia: 40 Duke 60 (Duke) Virginia: 50 Duke 70 (Duke) Virginia: 80 Duke 70 (Virginia)

    With n-gram=2 and separator being 'space and then some digits' it follows that (Duke) succeeds Virginia: Duke 2/3 of the time. Those are the kind of probabilities Bliako is using, pretty sure.

      I didn't see any probability calculations in your script that I looked at. You asked for equiprobable.

      I did. That was meant to start things off, to get on the proverbial scoreboard. They still work for the naive use of appositives, which remains a component of this output. The equaprobable outcomes do sum to one, so we aren't too far afield of bliako's development with cumulative probablity. I've included another way to generate the probabilities now. The working version of what I have now follows with abridged output and source:

      Thanks for your comments,

        Ok I sent you a pull request, here's some changes I would make in the syntax

        • default options in case user does not specify
        • hash slice
        • ternary operator
        • eliminate superfluous variables
        diff --git a/7.mm.pl b/7.mm.pl old mode 100644 new mode 100755 index 84efc24..7db8750 --- a/7.mm.pl +++ b/7.mm.pl @@ -8,7 +8,7 @@ use Text::Template; use POSIX qw(strftime); binmode STDOUT, 'utf8'; -my ($sub_dir) = $ARGV[0]; +my ($sub_dir) = $ARGV[0] || 'out'; say "sub_dir is $sub_dir"; my $path1 = Path::Tiny->cwd; say "path1 is $path1"; @@ -61,14 +61,11 @@ while ( $trials > 0 ) { my %vars = map { $_->[0], $_->[ rand( $#{$_} ) + 1 ] } @{$data}; # further stochastic output from "playing" the games - $vars{"winners"}=$string_sieger; - $vars{"cardinality"}=$anzahl; - $vars{"region"}=$r; + @vars{qw/winners cardinality region/} = ($string_sieger,$anzah +l,$r); my $rvars = \%vars; #important - my @pfade = $path2->children(qr/\.txt$/); - @pfade = sort @pfade; + my @pfade = sort $path2->children(qr/\.txt$/); #say "paths are @pfade"; @@ -80,8 +77,7 @@ while ( $trials > 0 ) { SOURCE => $file, ) or die "Couldn't construct template: $!"; - my $result = $template->fill_in( HASH => $rvars ); - $out_file->append_utf8($result); + $out_file->append_utf8($template->fill_in( HASH => $rvars)); } say "-------system out---------"; system("cat $out_file"); @@ -167,14 +163,7 @@ sub play_game { my $denominator = $1 + $3; my $ratio = $3 / $denominator; say "ratio was $ratio"; - my $random_number = rand(); - if ( $random_number < $ratio ) { - push @winners, "$1.$2"; - } - else { - push @winners, "$3.$4"; - } - + push @winners, rand() < $ratio ? "$1.$2" : "$3.$4" } }

        None of those really change anything, just make it "cleaner". What's better, a doctor that cures with more medicine or less?

        The data structure you use is a string '$rank.$team', that's not good! The rank should be a property of the team and division. And really if you think object oriented you have teams, divisions, games, lots of directions to go. You need a really flexible win_predictor() function or class, because it's likely to grow a lot. And then the templating is almost a totally separate thing, which is also going to change.

        What this shows is that I have too much repitition of the introduction and the summary. I also don't have any mechanisms for going beyond round one. (Gladly taking suggestions on how I might do that.)

        in play_game(), you have to decouple the data to do the prediction, bad. But it's almost recursive already. If you don't change the data structure, you need to put it back in the form the sub expects and just call play_game(\@winners). I guess bye rounds will make that a little trickier.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1231580]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2021-01-25 07:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?