This small script, apart from being mostly useless, generates credible sentences from a source text.
The idea is to keep track of all the possible words
following a given group of words. Once this hash of arrays
has been constructed, we pick a random starting point and
construct the new text by looking for words that may
follow the preceding group.
The input text should be large enough to have a broad
vocabulary and sentences range. You can find good samples on
the Project Gutenberg site.
You can also try to create interesting mixes of texts, by feeding two novels in the input file!
Have fun!
#!/usr/bin/perl -w
#
# transcramble - random text generator
# author: Briac Pilpré
#
# v.0.1 : initial release.
# v.0.2 : Added Storable support and Getopt::Mixed
# for a nicer interface.
#
# idea 'borrowed' from : http://www.eblong.com/zarf/markov/
use strict;
use locale;
use Storable;
use Getopt::Mixed;
use vars qw($VERSION
$opt_g $opt_w $opt_s
$opt_i $opt_h $opt_v
);
$VERSION = 0.2;
Getopt::Mixed::getOptions(
'g=i group>g w=i words>w s=s store>s i=s input h help>h v verbose>
+v'
);
$|++;
# usage: transcramble group words stored_file > output
if ($opt_h){
print <<"_HELP_";
transcrambler v$VERSION
usage: transcramble -g 3 -w 500 -i stored_chains > output
usage: transcramble -g 2 -w 400 -s chains_to_store < input > outpu
+t
-g --group : How to group words (1, 2 or 3)
-w --words : Number of words to generate
-i --input : Input file generated by the -s switch
-s --store : Name of the file where the chains
generated are to be stored
-h --help : Display this help message and exit
_HELP_
exit 0;
}
# How to group words in the text
# 1: Group words one by one, makes nonsenses
# 2: List every word following each preceding couple of words
# 3: List every word following each preceding triplet of words
# (make the text really like the original)
my $group = $opt_g || 2;
# Number of words to spew
my $words = $opt_w || 1000;
# Stored chains file
my $in_file = $opt_i;
# File to store the generated chains (can't be used with -i though)
my $out_file = $opt_i ? undef : $opt_s;
# Frequency list creation
my ( %tuple, @remains );
if ($in_file){
print STDERR "Retrievieng the chains from $in_file:" if $opt_v;
%tuple = %{ retrieve($in_file) };
print STDERR " OK\n" if $opt_v;
}
else {
print STDERR "Creating the chains. This may take a while\n" if $op
+t_v;
while (<>) {
# Split the current line for every 'word', including punctuati
+on
my @w = split ( /[^\w\n,.!"?;’'-]+/, $_ );
# If there was some words left of the previous line, add them
@w = ( @remains, @w );
while (@w) {
# If we have enough words to fill the group
if ( scalar @w >= $group + 1 ) {
# Shove the next word in the word array
push @{ $tuple{ join ( ' ', @w[ 0 .. $group - 1 ] ) }
+}, $w[$group];
# And move to the next word
shift (@w);
}
else {
# At the end of the line, we keep the remaining words.
@remains = @w;
undef @w;
}
}
}
print STDERR "Saving chains to $out_file:" if $opt_v;
store \%tuple, $out_file;
print STDERR " OK\n" if $opt_v;
}
# Used to pick a random key in the hash
my $lh = scalar keys %tuple;
print STDERR "Done constucting the tuples. $lh tuples\n";
# Select a random key to begin
my $key = ( keys %tuple )[ int rand($lh) ];
while ( $words-- ) {
my @first = split (/ /, $key );
shift @first;
# Pick a word at random in the possible following words
my $last = @{ $tuple{$key} }[ int rand( @{ $tuple{$key} } ) ];
print " $last";
# And see if we can continue with the sentence or
if ( defined $tuple{ join ( ' ', @first, $last ) } ) {
$key = join ( ' ', @first, $last );
}
# Start a new one by picking a new starting group.
else {
$key = ( keys %tuple )[ int rand($lh) ];
}
}
print STDERR "\nTranscrambling done.\n\n" if $opt_v;
<kbd>--
my $ OeufMayo = new PerlMonger::Paris({http => ' paris.mongueurs.net'});</kbd>
Re: Transcramble - Random text generator
by larsen (Parson) on Jul 09, 2001 at 15:30 UTC
|
Very nice :)
And very interesting, making some considerations.
Grouping words one by one makes nonsense, but we have a text
with the same stathistical properties of the original text.
Grouping word two by two of three by three is the same, but
we don't lost meaning (well, better, we don't lost too much
meaning).
Using letters instead of words as units, I think (this is only
a conjecture) we could "mimic" the sound of the original language,
realizing a new kind of your Acme::Translator :). A kind
that could be difficult to obtain even making Acme::Translator
having Sex with Transcramble - Random text generator (brrr....).
Even more interesting experiments could be done using music
instead of texts. Using something from the MIDI suite
we could import a melody and use it to generate other similar
(in the sense described by your program) melodies.
Well, just a thought, but could drive to interesting projects
(I've always been interested in generative arts). | [reply] |
(tye)Re: Transcramble - Random text generator
by tye (Sage) on Jul 10, 2001 at 09:15 UTC
|
| [reply] |
|
#!/usr/bin/perl
# Usage travesty [files]
# First analyse the input.
while(<>) {
# handle article headers and ">>>" quotes.
next if /^\./;
next if /^From / .. /^$/;
next if /^Path: / .. /^$/;
s/^\W+//;
# Do each word.
push(@ary,split(' '));
while ($#ary > 1) {
$a = $p;
$p = $n;
$w = shift(@ary);
$n = $num{$w};
if ($n eq '') {
push(@word,$w);
$n = pack('S',$#word);
$num{$w} = $n;
}
$lookup{$a . $p} .= $n;
}
}
# Now spew out the words, based on the frequencies. If there
# is more than one possibility to choose from choose one
# randomly.
for (;;) {
$n = $lookup{$a . $b};
($foo,$n) = each(lookup) if $n eq ''; # A bootstrap.
$n = substr($n,int(rand(length($n))) & 0177776,2);
$a = $p;
$p = $n;
($w) = unpack('S',$n);
$w = $word[$w];
# See if word fits on line;
$col += length($w) +1;
if ($col >= 65) {
$col = 0;
print "\n";
}
else {
print ' ';
}
print $w;
# Paragraph every 10 sentences or so.
if ($w =~ /\.$/) {
if (rand() < .1) {
print "\n";
$col = 80;
}
}
}
p.s. These little gems are the reason I keep that old pink book around. | [reply] [d/l] |
|
I tried this using a large text file from Writing of the Early Church Father", at Christian Classics Ethereal Library. The results were pretty amazing! Nice that you kept the "old book" around.
| [reply] |
Re: Transcramble - Random text generator
by thraxil (Prior) on Jul 10, 2001 at 00:09 UTC
|
cool.
i did a very similar thing with markov chains on my weblog. whenever someone posts an entry, it updates the frequency count table (it all runs off a postgres db). then, from that, it generates a random post at every time the page is loaded. the idea was that it would be sort of a "hive-mind" of all the people who post there.
when i was writing it i had an idea that i still haven't had time to follow through on, but that your code is probably better suited for. i wanted to create a bot that would go through the discussions on slashdot, and for each thread, pick the highest rated posts, generate a frequency table from that and then post a random message to that discussion. then i wanted to watch how those posts get moderated. i'm pretty sure the markov script could get some pretty good karma.
anders pearson
| [reply] |
Re: Transcramble - Random text generator
by John M. Dlugosz (Monsignor) on Jul 10, 2001 at 02:48 UTC
|
Many years ago, I saw a program called Bable (or something spelled similarly to that) which did this based on sample files. I think it used a Markov chain of two previous words, and punctuation was part of the word.
The best use of it was when some crazy posts started appearing in the discussion group I hung out in. We couldn't figure out at first if this person was totally psycotic or just wrote so badly that nobody could figure out what she was trying to say! After a few attempts to communicate with her didn't work out, I used that program to mimic her writing style and replied to one. It drew a lengthy reply! We seem to have carried on a conversation, though I have no idea what it meant. It was fascenating in that the program included her pet tag-phrases and other affectations, as well as the overall flavor.
—John | [reply] |
Re: Transcramble - Random text generator
by beretboy (Chaplain) on Jul 12, 2001 at 16:28 UTC
|
You may want to checkout Games::Dissociate
"Sanity is the playground of the unimaginative"
-Unknown | [reply] [d/l] |
|
|