Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

This small script, apart from being mostly useless, generates credible sentences from a source text.

The idea is to keep track of all the possible words following a given group of words. Once this hash of arrays has been constructed, we pick a random starting point and construct the new text by looking for words that may follow the preceding group.

The input text should be large enough to have a broad vocabulary and sentences range. You can find good samples on the Project Gutenberg site.
You can also try to create interesting mixes of texts, by feeding two novels in the input file!

Have fun!

#!/usr/bin/perl -w # # transcramble - random text generator # author: Briac Pilpré # # v.0.1 : initial release. # v.0.2 : Added Storable support and Getopt::Mixed # for a nicer interface. # # idea 'borrowed' from : http://www.eblong.com/zarf/markov/ use strict; use locale; use Storable; use Getopt::Mixed; use vars qw($VERSION $opt_g $opt_w $opt_s $opt_i $opt_h $opt_v ); $VERSION = 0.2; Getopt::Mixed::getOptions( 'g=i group>g w=i words>w s=s store>s i=s input h help>h v verbose> +v' ); $|++; # usage: transcramble group words stored_file > output if ($opt_h){ print <<"_HELP_"; transcrambler v$VERSION usage: transcramble -g 3 -w 500 -i stored_chains > output usage: transcramble -g 2 -w 400 -s chains_to_store < input > outpu +t -g --group : How to group words (1, 2 or 3) -w --words : Number of words to generate -i --input : Input file generated by the -s switch -s --store : Name of the file where the chains generated are to be stored -h --help : Display this help message and exit _HELP_ exit 0; } # How to group words in the text # 1: Group words one by one, makes nonsenses # 2: List every word following each preceding couple of words # 3: List every word following each preceding triplet of words # (make the text really like the original) my $group = $opt_g || 2; # Number of words to spew my $words = $opt_w || 1000; # Stored chains file my $in_file = $opt_i; # File to store the generated chains (can't be used with -i though) my $out_file = $opt_i ? undef : $opt_s; # Frequency list creation my ( %tuple, @remains ); if ($in_file){ print STDERR "Retrievieng the chains from $in_file:" if $opt_v; %tuple = %{ retrieve($in_file) }; print STDERR " OK\n" if $opt_v; } else { print STDERR "Creating the chains. This may take a while\n" if $op +t_v; while (<>) { # Split the current line for every 'word', including punctuati +on my @w = split ( /[^\w\n,.!"?;’'-]+/, $_ ); # If there was some words left of the previous line, add them @w = ( @remains, @w ); while (@w) { # If we have enough words to fill the group if ( scalar @w >= $group + 1 ) { # Shove the next word in the word array push @{ $tuple{ join ( ' ', @w[ 0 .. $group - 1 ] ) } +}, $w[$group]; # And move to the next word shift (@w); } else { # At the end of the line, we keep the remaining words. @remains = @w; undef @w; } } } print STDERR "Saving chains to $out_file:" if $opt_v; store \%tuple, $out_file; print STDERR " OK\n" if $opt_v; } # Used to pick a random key in the hash my $lh = scalar keys %tuple; print STDERR "Done constucting the tuples. $lh tuples\n"; # Select a random key to begin my $key = ( keys %tuple )[ int rand($lh) ]; while ( $words-- ) { my @first = split (/ /, $key ); shift @first; # Pick a word at random in the possible following words my $last = @{ $tuple{$key} }[ int rand( @{ $tuple{$key} } ) ]; print " $last"; # And see if we can continue with the sentence or if ( defined $tuple{ join ( ' ', @first, $last ) } ) { $key = join ( ' ', @first, $last ); } # Start a new one by picking a new starting group. else { $key = ( keys %tuple )[ int rand($lh) ]; } } print STDERR "\nTranscrambling done.\n\n" if $opt_v;
<kbd>--
my $OeufMayo = new PerlMonger::Paris({http => 'paris.mongueurs.net'});</kbd>

In reply to Transcramble - Random text generator by OeufMayo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others wandering the Monastery: (11)
    As of 2014-11-25 22:23 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My preferred Perl binaries come from:














      Results (160 votes), past polls