Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
I need some help for Markov algorithm for followign question Markov chain algorithm that will allow you to write a program to analyze your current publication's texts and generate random text that uses phrases in a manner similar to the input text.

You ask how this works and she explains:

Find some body of text (in our case, text files) that you want to imitate. For every pair of words that occurs in the text, keep track of each word that can follow that pair of words. So, for every pair of words, you would know a) which words followed that pair of words AND b) know at what probability those words might follow the pair of words. (See examples below.)

Using the information gathered in the previous step. Start with a pair of two consecutive words ($word_one and $word_two) that occur in the text, print those two words, then randomly choose the next word ($next_word) according to the probability that it would follow those two words. Print that word. Now use the second word ($word_two) and the new word ($next_word) as your two consecutive words and repeat this process until you have generated the amount of text you want or hit a word pair that has no next word.

Let us look at an example from The New Testament According to Dr. Suess:

He didn't come in a plane.
He didn't come in a Jeep.
He didn't come in a pouch
Of a high jumping Voveep.
If we were to analyze the word pairs, we see the following pairs of words in the text:
He didn't come [3, 100.0%] Jeep. He didn't [1, 100.0%] Of a high [1, 100.0%] a Jeep. He [1, 100.0%] a high jumping [1, 100.0%] a plane. He [1, 100.0%] a pouch Of [1, 100.0%] come in a [3, 100.0%] didn't come in [3, 100.0%] high jumping Voveep. [1, 100.0%] in a Jeep. [1, 33.3%] pouch [1, 33.3%] plane. [1, 33.3%] plane. He didn't [1, 100.0%] pouch Of a [1, 100.0%]
We can see that the word pair He didn't occurred three times, each time followed by the word come (at 100% probability). And the word pair in a occurred three times, followed by either Jeep., pouch, or plane (each of these with a 33.3% probability).

Your task is write a program called babble that will read text from <> and apply the Markov Chain algorithm to generate random text that reads like the input text.

Your program will also take three options (you are advised to use Getopt::Long qw(GetOptions) but you may use other methods if you insist):

--words (the total number of words to generate)
--paragraphs (the number of words per paragraph)
--show_pairs (show the word pairs and frequencies as in the example above, which is sorted alphabetically by word pairs, then by decreasing frequency for the next words). If --show_pairs is given as an option, your program should not do any babbling, just output the table and exit.

You are advised to implement --show_pairs first. This will require designing a data structure to store the "word pair" to "next word" mappings (when you hear "map", you might think "hash" or "hashref") and then writing a subroutine to load/build this data structure from the input text. Don't worry about capitalization and punctuation -- you can treat anything that's not whitespace as word characters (i.e., @words = split() is a perfectly acceptable construct to use to get your words). Once you have --show_pairs working, you should be able to do something like this:

(Thu Oct 18 21:54:27): ~/e13cvs/users/solutions/hw3/ austin@elmo 21 $ ./babble --show_pairs data/example.txt He didn't come [3, 100.0%] Jeep. He didn't [1, 100.0%] Of a high [1, 100.0%] a Jeep. He [1, 100.0%] a high jumping [1, 100.0%] a plane. He [1, 100.0%] a pouch Of [1, 100.0%] come in a [3, 100.0%] didn't come in [3, 100.0%] high jumping Voveep. [1, 100.0%] in a Jeep. [1, 33.3%] pouch [1, 33.3%] plane. [1, 33.3%] plane. He didn't [1, 100.0%] pouch Of a [1, 100.0%]
Edit by dws to rescue formatting

In reply to Re: Re: Re: Markov Chain Program by Anonymous Monk
in thread Markov Chain Program by sacked

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others drinking their drinks and smoking their pipes about the Monastery: (13)
    As of 2014-09-17 20:01 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (98 votes), past polls