Re: bracketology was Re^2: making a markovian "mad lib"

in reply to bracketology was Re^2: making a markovian "mad lib"
in thread making a markovian "mad lib"

So I'm happy with this as an intermediate result but started puzzling on what I was coding towards. I wanted to throw the ball to the dog and have that repeat through probabilistic methods. The repeated events were hard to extricate causally. If I tried to represent the scenario with object oriented methods, what would the "objects" be? The story, the things in the story, the state of the things in the story, the Animals as with _Intermediate Perl_? Wherein does it show Markovian nature?

I didn't see any probability calculations in your script that I looked at. You asked for equiprobable. Bliako uses a corpus to build up probability tables, you used a template with some variables. So, from my very limited, but now less so by reading through n-dimensional statistical analysis of DNA sequences (or text, or ...), you need some input to build up those probability tables. If I understand correctly: for some input, some text probably follows some other text. Then for some other (or same) input, you replace the text based on those probabilities. In sports, some team beats some other team with some frequency.

So if you want to utilize Bliakos methods you need a corpus that relates. Like, previous scores. You are using rankings (seeds), but that will just give you what you already have. It's not going to increase the accuracy if ranked 1 probably beats ranked 2. You need a corpus with scores or something. Then you might be able to bet the spread.

Bliako's solution aims to be more general in that, the n-gram is configurable, as well as what separates the n-grams. You have to have an input that matches that criteria or hammer it into that. Team X vs Team Y will not be read the same as Team Y vs Team X. And it's not going to account for degrees, only the probability of a sequence. Degrees like, if team X beats Y by a whole lot. Or what is the average point difference, things like that. So if you had a corpus like

Virginia: 40 Duke 60 (Duke)
Virginia: 50 Duke 70 (Duke)
Virginia: 80 Duke 70 (Virginia)
[download]

With n-gram=2 and separator being 'space and then some digits' it follows that (Duke) succeeds Virginia: Duke 2/3 of the time. Those are the kind of probabilities Bliako is using, pretty sure.

In Section Seekers of Perl Wisdom