Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Markov Chain Program

by dws (Chancellor)
on Feb 02, 2001 at 04:10 UTC ( [id://55914]=note: print w/replies, xml ) Need Help??


in reply to Markov Chain Program

To improve of this, you may have to rethink your data structures. I've had good results by weighting the random choice of the next token based on its frequency of occurence after the prior n tokens (where n==2 in your code).

One approach would be to extend your data structure along the lines of

%wordslist = ( 'blessed is' => ['the', 47, 'a', 23, 'vroom', 1], ... );
and then to use the frequency counts to weight the random choice.

For an object-oriented approach that builds n-deep trees, take a look at babelize.

Replies are listed 'Best First'.
(sacked: frequency counts) Re: Re: Markov Chain Program
by sacked (Hermit) on Feb 02, 2001 at 22:57 UTC
    Thanks, I like the idea of using frequency counts very much. I turned to The Perl Cookbook for some help, and the results are below. Note that the default value for $maxwords is much lower for this version to prevent the program from taking a very long time to complete, as the subroutine weighted_suffix() often needs to loop many times when there is a low number of suffixes for a given prefix.
    #!/usr/bin/perl -w # # based on kernighan and pike's markov chain generator # in _the practice of programming_, chapter 3 # (http://cm.bell-labs.com/cm/cs/tpop/markov.pl) # use strict; my @words; # words on a line my %wordlist; # key: prefix, value: anon hash (k: suffix, # v: frequency) my $pref_len = shift @ARGV || 2; my $maxwords = shift @ARGV || 100; my $entries = 0; # build word list # # 'Blessed is the man that walketh not in the counsel' # %wordlist = ( 'blessed is' => { 'the' => 1, }, # 'is the' => { 'man' => 1, }, # 'the man' => { 'that'=> 1, }, # ); # while (<>) { my $suf; push @words, split; while ( @words > $pref_len ) { # build prefix of $pref_len words # join(' ', @array) is faster than qq(@array) or "@array" # my $pref = join(' ', @words[0..($pref_len-1)]); # add suffix to list # $suf = $words[$pref_len]; $wordlist{$pref}{$suf}++; shift @words; # next word on this line $entries++; } } # change frequency count to a percentage # (with help from pcb, recipe 2.10) # foreach my $href ( values %wordlist ) { foreach ( values %$href ) { $_ /= $entries; } } # starting point # my $pref = (keys %wordlist)[rand keys %wordlist]; print "$pref"; # dump out listings # for (0..($maxwords-1)) { last unless (exists $wordlist{$pref}); my $suf = weighted_suffix(); print ' '. $suf; print "\n" if ( $_ % 10 == 0); # skip past first word in prefix # $pref =~ s/^[^ ]+ (.+)$/$1 $suf/; } exit; # from pcb (recipe 2.10) # sub weighted_suffix { my ($suf,$weight,$rand); while (1) { $rand = rand; while ( ($suf,$weight) = each %{ $wordlist{$pref} } ) { return $suf if ($rand -= $weight) < 0; } } }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://55914]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-04-16 05:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found