http://www.perlmonks.org?node_id=125130

thealienz1 has asked for the wisdom of the Perl Monks concerning the following question:

"Alrighty people... I must say I am stumped. Yet again." -Eeyore type voice

I am trying to write a program that implements the Markov algorithm and found the examples on perlmonks very useful, but I have a bit of a dilemma. I do not want my senetence to be completely random. For example I want to be able to say that the senteneces to be generated with key words in the sentence.

For istance the sentence must contain the words "flying" and "toaster". And they must appear in the sentence at some random fixed point.

rndwrd1 rndwrd2 toaster rndwrd3 flying rndwrd4

Where rndwrd# is the random word that Markov finds in relationship to flying or toaster. Now keep in mind that "flying toaster" can appear together as one group of words. So, you can see that I have no f*cking clue what to do. I do not understand the Markov well enough to go backwards and forwards through the list.

Can anyone help me out?

Justin Archie

Replies are listed 'Best First'.
Re: help with a new type of Markov
by dws (Chancellor) on Nov 14, 2001 at 02:20 UTC
    I am trying to write a program that implements the Markov algorithm ... (two words) must appear in the sentence at some random fixed point.

    Depending on what you mean by "random fixed point", you may have moved out-of-bounds with respect to the Markov algorithm.

    For the benefit of those following along, the traditional Markov random-text generator samples text to (conceptually) build a big sparse matrix. Each cell in the matrix represents the probability of transitioning from one state to the next, where "state" is a tuple of the N adjacent tokens. For example, by sampling

    the quick red fox jumped over the lazy perl hacker
    with N=1, the matrix will show that the probability of transitioning from (the) to (quick) is 50%. By sampling some larger text with N=2, the matrix might show that the probability of transitioning from (Perl is) to (is neat) to be 90%, and to (is hard) to be 10%.

    The transition matrix is easier to implement using a tree, but conceptually it's just a sparse matrix.

    To generate random text, we just roll the dice and choose a legal transition.

    Your problem complicates this considerably. By saying that you want two words A and B to appear in the sentence at random points, you require an algorithm that finds a legal set of transitions that first gets you to A (or B), and then eventually gets you to B (or A), and then eventually terminates with the end of a sentence. You could do this with a tree search, but there's no guarantee that it will terminate unless a valid sequence of paths existed in your source text. And depending on how you record sentence start and sentence end, there might not be.

    You might try this. Start with A, run the algorithm backwards until you find a valid sentence start. To do this (assuming you're using a tree), you'll need to build a second tree to do backwards transitions. Then run the algorithm forward building a tree breadth-first (so that you can terminate at some arbitrary depth) looking for B. If you find it, run the algorithm forward until you find the end of then sentence (again using a breadth-first search).

    And I recommend that you don't try to hold your breath while the algorithm runs. :)