Category: | Text Processing |
Author/Contact Info | sacked |
Description: | Taking the suggestion from Kernighan and Pike's The Practice of Programming, I wrote another version of their Markov Chain program (chapter 3) that allows for different length prefixes. It works best with shorter prefixes, as they are more likely to occur in the text than longer ones. Please offer any tips for improving/enhancing this script. Thanks! |
#!/usr/bin/perl -w # # based on kernighan and pike's markov chain program # in _the practice of programming_, chapter 3 # (http://cm.bell-labs.com/cm/cs/tpop/markov.pl) # # Usage: markov.pl [prefix length] [max words to print] <infile use strict; my @words; # a line of text my %wordlist; # key: prefix, value: array of suffixes my $pref_len = shift @ARGV || 2; my $maxwords = shift @ARGV || 10000; # build word list # # 'Blessed is the man that walketh not in the counsel' # %wordlist = ( 'blessed is' => [ 'the', ], # 'is the' => [ 'man', ], # 'the man' => [ 'that', ], # ); while (<>) { push @words, split; while ( @words > $pref_len ) { # build prefix of $pref_len words # join(' ', @array) is faster than qq(@array) or "@array" # my $pref = join(' ', @words[0..($pref_len-1)]); # add suffix to list # push @{ $wordlist{$pref} }, $words[$pref_len]; shift @words; # next word on this line } } # starting point # my $pref = (keys %wordlist)[rand keys %wordlist]; print "$pref"; # dump out listings # for (0..($maxwords-1)) { last if not defined($wordlist{$pref}); my $index = rand @{ $wordlist{$pref} }; my $suf = $wordlist{$pref}[$index]; print ' '. $suf; print "\n" if ( $_ % 10 == 0); # skip past first word in prefix # #$pref = (split(' ', $pref))[1..$pref_len-1] . ' ' . $suf; $pref =~ s/^[^ ]+ (.+)$/$1 $suf/; } __END__ two good samples generated from the book of psalms: "For, lo, thine enemies, and the horn of David the son of thine house hath eaten me up" "His mouth is full of troubles" |
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Markov Chain Program
by dws (Chancellor) on Feb 02, 2001 at 04:10 UTC | |
by sacked (Hermit) on Feb 02, 2001 at 22:57 UTC | |
Re: Markov Chain Program
by extremely (Priest) on Feb 02, 2001 at 04:13 UTC | |
by Anonymous Monk on Aug 29, 2001 at 19:34 UTC | |
by Anonymous Monk on Nov 14, 2002 at 05:28 UTC |
Back to
Code Catacombs