http://www.perlmonks.org?node_id=1231668


in reply to bracketology was Re^2: making a markovian "mad lib"
in thread making a markovian "mad lib"

So I'm happy with this as an intermediate result but started puzzling on what I was coding towards. I wanted to throw the ball to the dog and have that repeat through probabilistic methods. The repeated events were hard to extricate causally. If I tried to represent the scenario with object oriented methods, what would the "objects" be? The story, the things in the story, the state of the things in the story, the Animals as with _Intermediate Perl_? Wherein does it show Markovian nature?

I didn't see any probability calculations in your script that I looked at. You asked for equiprobable. Bliako uses a corpus to build up probability tables, you used a template with some variables. So, from my very limited, but now less so by reading through n-dimensional statistical analysis of DNA sequences (or text, or ...), you need some input to build up those probability tables. If I understand correctly: for some input, some text probably follows some other text. Then for some other (or same) input, you replace the text based on those probabilities. In sports, some team beats some other team with some frequency.

So if you want to utilize Bliakos methods you need a corpus that relates. Like, previous scores. You are using rankings (seeds), but that will just give you what you already have. It's not going to increase the accuracy if ranked 1 probably beats ranked 2. You need a corpus with scores or something. Then you might be able to bet the spread.

Bliako's solution aims to be more general in that, the n-gram is configurable, as well as what separates the n-grams. You have to have an input that matches that criteria or hammer it into that. Team X vs Team Y will not be read the same as Team Y vs Team X. And it's not going to account for degrees, only the probability of a sequence. Degrees like, if team X beats Y by a whole lot. Or what is the average point difference, things like that. So if you had a corpus like

Virginia: 40 Duke 60 (Duke) Virginia: 50 Duke 70 (Duke) Virginia: 80 Duke 70 (Virginia)

With n-gram=2 and separator being 'space and then some digits' it follows that (Duke) succeeds Virginia: Duke 2/3 of the time. Those are the kind of probabilities Bliako is using, pretty sure.

Replies are listed 'Best First'.
Re^2: bracketology was Re^2: making a markovian "mad lib"
by Aldebaran (Curate) on Mar 28, 2019 at 22:59 UTC
    I didn't see any probability calculations in your script that I looked at. You asked for equiprobable.

    I did. That was meant to start things off, to get on the proverbial scoreboard. They still work for the naive use of appositives, which remains a component of this output. The equaprobable outcomes do sum to one, so we aren't too far afield of bliako's development with cumulative probablity. I've included another way to generate the probabilities now. The working version of what I have now follows with abridged output and source:

    Thanks for your comments,

      Ok I sent you a pull request, here's some changes I would make in the syntax

      • default options in case user does not specify
      • hash slice
      • ternary operator
      • eliminate superfluous variables
      diff --git a/7.mm.pl b/7.mm.pl old mode 100644 new mode 100755 index 84efc24..7db8750 --- a/7.mm.pl +++ b/7.mm.pl @@ -8,7 +8,7 @@ use Text::Template; use POSIX qw(strftime); binmode STDOUT, 'utf8'; -my ($sub_dir) = $ARGV[0]; +my ($sub_dir) = $ARGV[0] || 'out'; say "sub_dir is $sub_dir"; my $path1 = Path::Tiny->cwd; say "path1 is $path1"; @@ -61,14 +61,11 @@ while ( $trials > 0 ) { my %vars = map { $_->[0], $_->[ rand( $#{$_} ) + 1 ] } @{$data}; # further stochastic output from "playing" the games - $vars{"winners"}=$string_sieger; - $vars{"cardinality"}=$anzahl; - $vars{"region"}=$r; + @vars{qw/winners cardinality region/} = ($string_sieger,$anzah +l,$r); my $rvars = \%vars; #important - my @pfade = $path2->children(qr/\.txt$/); - @pfade = sort @pfade; + my @pfade = sort $path2->children(qr/\.txt$/); #say "paths are @pfade"; @@ -80,8 +77,7 @@ while ( $trials > 0 ) { SOURCE => $file, ) or die "Couldn't construct template: $!"; - my $result = $template->fill_in( HASH => $rvars ); - $out_file->append_utf8($result); + $out_file->append_utf8($template->fill_in( HASH => $rvars)); } say "-------system out---------"; system("cat $out_file"); @@ -167,14 +163,7 @@ sub play_game { my $denominator = $1 + $3; my $ratio = $3 / $denominator; say "ratio was $ratio"; - my $random_number = rand(); - if ( $random_number < $ratio ) { - push @winners, "$1.$2"; - } - else { - push @winners, "$3.$4"; - } - + push @winners, rand() < $ratio ? "$1.$2" : "$3.$4" } }

      None of those really change anything, just make it "cleaner". What's better, a doctor that cures with more medicine or less?

      The data structure you use is a string '$rank.$team', that's not good! The rank should be a property of the team and division. And really if you think object oriented you have teams, divisions, games, lots of directions to go. You need a really flexible win_predictor() function or class, because it's likely to grow a lot. And then the templating is almost a totally separate thing, which is also going to change.

      What this shows is that I have too much repitition of the introduction and the summary. I also don't have any mechanisms for going beyond round one. (Gladly taking suggestions on how I might do that.)

      in play_game(), you have to decouple the data to do the prediction, bad. But it's almost recursive already. If you don't change the data structure, you need to put it back in the form the sub expects and just call play_game(\@winners). I guess bye rounds will make that a little trickier.