"be consistent" PerlMonks

### Random data generation.

by BrowserUk (Pope)
 on Jun 26, 2010 at 02:13 UTC Need Help??
BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

I want to generate strings of length N, draw from a set of M chars, but with no more than two of any given char consecutively.

Examples:

```set ABCDEF;
length 12;

Valid:
ABCDEFABCDEF
AABBCCDDEEFF

Invalid:
AAABCDEFABCD

I don't know whether this is tougher than it first appears, or I'm just having a bad head day. but my attempts so far involve cumbersome loops with the setting and resetting of myriad flags.

Any thoughts on a clean way to do this?

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re: Random data generation.
by ikegami (Pope) on Jun 26, 2010 at 02:37 UTC

This can be solved simply by viewing the set as a circular list. Just pick one of the next N-1 characters.

```# No two repeated characters

my @set = qw( a b c d e f );
my \$len = 12;

my \$s = '';
my \$i = int(rand(@set));
for (1..\$len) {
\$s .= \$set[\$i];
\$i = ( \$i + int(rand(\$#set)) + 1 ) % @set;
}

print("\$s\n");
```dbadfbabfbcd

Update: Fixed bugs, then I realized I misread the question. On the plus side, the solution can easily be adapted to solve the actual problem.

```# No more than two repeated characters

my @set = qw( a b c d e f );
my \$len = 12;

my \$s = '';
my \$i;
for (1..\$len) {
if (\$s =~ /(.)\1\z/) {
\$i = ( \$i + int(rand(\$#set)) + 1 ) % @set;
} else {
\$i = int(rand(@set));
}
\$s .= \$set[\$i];
}

print("\$s\n");
```aadbadfafbfb
dbfaefababea
efeabbcebbee
Re: Random data generation.
by jwkrahn (Monsignor) on Jun 26, 2010 at 12:41 UTC
```\$ perl -le'
use List::Util qw/ shuffle /;
my @chars = qw/ A B C D E F /;
my \$length = 12;
my \$repeats = 2;

my @set = shuffle( ( @chars ) x \$repeats );
die "Error!\n" if @set < \$length;

print join "", @set[ 0 .. \$length - 1 ];
'

That's the one. For the life of me I don't see how it works yet, but it does. Many thanks.

As there are only two of each character in the set which is shuffled ((@chars) x 2), there can of course also only occur two in a row at maximum.

OTOH, this technique doesn't allow more than two of any character in the result (like AACBAFFDCAEC), so it might introduce an unwanted bias... (you didn't explicitly say whether this is desired or not, and it cannot be inferred from your valid samples).

It's not random. Not even close. For example, you'll never have three of the same character anywhere in the string.

I don't know if it matters, but it only works if \$len <= @set*2

Re: Random data generation.
by AnomalousMonk (Chancellor) on Jun 26, 2010 at 05:57 UTC

ikegami's is a general solution for all cases, but would it be sufficient in non-extreme cases (i.e., not trying to generate 100-character strings from just two different characters and with no more than two sequential repeats) to just generate bunches of strings and throw away all that are non-compliant?

```>perl -wMstrict -le
"use List::Util qw(shuffle);
my @set = qw(a b c d e f);
my \$len = 12;
my \$needed = shift;
my @out;
while (@out < \$needed) {
my \$str = join '', shuffle((@set) x \$len);
push @out, grep !m{ (.)\1\1 }xms, \$str =~ m{ .{\$len} }xmsg;
}
@out = @out[-\$needed .. -1];
m{ (.)\1\1 }xms and die qq{3+ same: '\$_'} for @out;
printf qq{%3d: '\$out[\$_]' \n}, 1+\$_ for 0 .. \$#out;
" 5
1: 'becaecdbbedc'
2: 'afccbecefdca'
3: 'fbffecacfeaa'
4: 'ddbacddfafca'

You have it backwards. The smaller the set and the longer the string, the higher the chance of producing something that needs to be thrown away. What you call the non-extreme case is where your algorithm has the most problems.

The real problem is that your solution isn't random. The chance of picking a characters at a certain position is affected by the characters picked in previous positions.

The smaller the set and the longer the string, the higher the chance of producing something that needs to be thrown away.

I thought that that, in essence, was implied by what I wrote.

What you call the non-extreme case is where your algorithm has the most problems.

I take 'problems' to mean the generation of strings that don't meet the max-repeated-characters requirement and so must be thrown away.

So, if my algorithm was trying to generate strings of length 3 from a character set of 100 characters (my idea of a very 'non-extreme' case), which in the totally random case (OK, OK, my algorithm is only approximately random – but see below) would result about 1 in a million times in a string of three identical characters that needed to be discarded, this would be more problematic than trying to generate strings of, say, length 100 from a set of three characters? I don't understand.

The real problem is that your solution isn't random.

I agree it isn't completely random, but, as others have noted, it wasn't clear from the OP and subsequent discussion that complete randomness was required. I was aiming for something quick and dirty that would satisfy the max-repeated-characters requirement while still being sort of random-ish.

Re: Random data generation.
by bluescreen (Friar) on Jun 26, 2010 at 02:59 UTC

Is this what you looking for?

```#!/usr/bin/perl
use strict;

my @set = ('A', 'B', 'C', 'D', 'E', 'F');
my \$length = 12;

my \$result;

my \$last_char;
while(length(\$result) < \$length) {
my \$char = \$set[int(rand()*(scalar(@set)-1))];
next if ( \$last_char eq \$char);
\$result .= \$char;
\$last_char = \$char;
}

print \$result;

Your approach could be made to work by simply checking the last two characters:

```my @set = qw( A B C D E F );
my \$len = 12;
my \$result = '';

while (length(\$result) < \$len) {
my \$char = \$set[ rand @set ];
next if substr(\$result, -2) eq \$char x 2;
\$result .= \$char;
}
print \$result;

Also, there's no need to use an extra variable (like \$last_char), as \$result already stores all the previous characters.

Absolutely it was a quick-&-dirty script, probably to much :D, I knew that instead of \$last_char I could have used substr(), but premature optimization made me do that.

Anyway this sort of social coding brought an interesting result

You made the same error I did originally, you attempted to solve "no repeating characters" rather than "no more than two repeating characters".

Furthermore, your code never uses the last character of the set.

Not quite. I have to allow two consecutive chars, but not tripled or more...

Re: Random data generation.
by salva (Abbot) on Jun 26, 2010 at 15:40 UTC
```my @set = split //, "ABCDEF";
my \$len = 12;
my \$max_reps = 2;

say gen(\@set, \$len, \$max_reps) for (1..20);

sub gen {
my (\$set, \$len, \$max_reps) = @_;
my \$out = '';
my \$last_ix = -1;
my \$reps = 0;
for (1..\$len) {
my \$ix;
if (\$reps >= \$max_reps) {
\$ix = int rand(@\$set - 1);
\$ix++ if \$ix >= \$last_ix;
}
else {
\$ix = int rand(@\$set);
}
if (\$ix == \$last_ix) {
\$reps++;
}
else {
\$last_ix = \$ix;
\$reps = 1;
}
\$out .= \$set->[\$ix];
}
\$out;
}

Yep. That's pretty much exactly what I came up with for my first attempt. Though I think I needed one more flag.

Re: Random data generation.
by johngg (Abbot) on Jun 26, 2010 at 23:12 UTC

This seems to work, if I've understood correctly.

```use strict;
use warnings;

use 5.010;

say genString( 12, ( q{a} .. q{f} ) );

say genString( 20, ( q{a} .. q{c} ) );

say genString( 20, ( q{a} .. q{b} ) );

sub genString
{
my( \$len, @set ) = @_;
my \$string = q{};
for ( 1 .. \$len )
{
if ( \$string =~ m{(.)\1\$} )
{
my \$let;
do { \$let = \$set[ int rand @set ] } until \$let ne \$1;
\$string .= \$let;
}
else
{
\$string .= \$set[ int rand @set ];
}
}
return \$string;
}

The output.

```bceffcedefda
bccbbcaabbccacabbcaa
abaabbaabaabababbabb

You would need to add a test to prevent an endless loop if given a set with only one member.

I hope this is of interest.

Cheers,

JohnGG

A working solution, but like others that invoke the regex engine, it proves slower, especially for longer strings, than a flags & conditions solution. (Eg. salva's 846681).

```[16:16:52.15] c:\test>junk -I=3 ABCDEF 12
Rate   x0   x3   x8   x4   x6   x5   x7
x0  9284/s   --  -7% -16% -33% -38% -38% -47%
x3  9946/s   7%   -- -10% -29% -33% -34% -44%   ikegami
x8 11091/s  19%  12%   -- -20% -26% -26% -37%   johngg
x4 13951/s  50%  40%  26%   --  -7%  -7% -21%   ikegami variant
x6 14952/s  61%  50%  35%   7%   --  -0% -15%
x5 14964/s  61%  50%  35%   7%   0%   -- -15%
x7 17647/s  90%  77%  59%  26%  18%  18%   --   salva

[16:17:13.48] c:\test>junk -I=3 ABCDEF 120
Rate   x0   x3   x4   x8   x6   x5   x7
x0  279/s   --  -8% -10% -78% -83% -83% -91%
x3  302/s   8%   --  -3% -76% -81% -81% -90%
x4  310/s  11%   3%   -- -75% -81% -81% -89%
x8 1250/s 348% 314% 303%   -- -22% -22% -58%
x6 1595/s 472% 428% 414%  28%   --  -0% -46%
x5 1597/s 472% 429% 415%  28%   0%   -- -46%
x7 2951/s 958% 877% 851% 136%  85%  85%   --

[16:17:30.86] c:\test>junk -I=3 ABCDEF 1200
Rate    x0    x4    x3    x8    x6    x5    x7
x0 3.38/s    --   -1%   -3%  -97%  -98%  -98%  -99%
x4 3.42/s    1%    --   -1%  -97%  -98%  -98%  -99%
x3 3.46/s    3%    1%    --  -97%  -98%  -98%  -99%
x8  121/s 3478% 3433% 3388%    --  -24%  -24%  -63%
x6  159/s 4608% 4548% 4489%   32%    --   -1%  -51%
x5  160/s 4633% 4573% 4513%   32%    1%    --  -51%
x7  324/s 9499% 9377% 9256%  168%  104%  103%    --

As you can see. salva's solution wins easily on performance.

I wonder if you've overlooked my solution, or if there's anything wrong with it.

Not only is it simpler (both less lines of code and conceptually easier to understand, IMHO), it's also consistently faster than what you found to be the fastest:

```\$ ./846784.pl
Rate salva almut
salva  881/s    --  -15%
almut 1037/s   18%    --

Re: Random data generation.
by Khen1950fx (Canon) on Jun 26, 2010 at 04:01 UTC
I tried it with Test::LectroTest::Generator.
```#!/usr/bin/perl

use strict;
use warnings;
use Test::LectroTest::Generator
qw(:common :combinators);

my \$str_gen = String( charset => "a-f", length => [12,] );

for my \$string_guidance (1) {
my \$i = \$str_gen->generate(\$string_guidance);
print "\$i ";
}
print "\n";
Update: Trying to get a work-around for the constraint, I kept coming across warnings that the length cannot exceed the number of chars in the set. If I do length => [6,], then the constraint is automatic, and it works.

Is that in any way constrained to having no more than two consecutive repeats?

Re: Random data generation.
by davies (Parson) on Jun 26, 2010 at 16:45 UTC
This is actually a classic algorithm used for things like dealing a pack of cards. If you are dealing bridge hands, for example, you want 52 different cards with no repeats. Some games use multiple packs and/or cut down packs (Bézique uses a pack of 2 decks, A down to 7). You therefore set up a pack with all the possible cards, deal one at random, move the last card to the position from which the card was taken & reduce the pack by one. So, for your problem:
```use strict;
use warnings;

my \$nRepeats = 2;
my @sSet = qw (a b c d e f);
my \$nLength = 12;

my @sAll;
for (1..\$nRepeats) {
push (@sAll, @sSet);
}

if (\$nLength - 1 > \$#sAll) {die "Don't be silly"}

my \$sString;
for (1..\$nLength) {
my \$i = int(rand(@sAll));
\$sString .= \$sAll[\$i];
\$sAll[\$i] = \$sAll[-1];
pop(@sAll);
}

print "\$sString \n";
Now, I'm a beginner. So if anyone sees anything that could be improved (apart from Hungarian notation - it helps me conceptualise strong typing), I'd love to know.

Regards,

John Davies

Update: Re-reading your question, if AABAABAABAAB is valid, my code won't do what you want, but I'm not clear on whether that's what you mean.
Re: Random data generation.
by Limbic~Region (Chancellor) on Jun 27, 2010 at 16:30 UTC
BrowserUk,
I haven't read any of the other solutions but this is the first idea that popped into my head. I haven't really tested it nor is it without at least 1 assumption.
```#!/usr/bin/perl
use strict;
use warnings;

my @M = 'a' .. 'z';
my \$N = 42; # Assumes N is > 2

my (\$minus_1, \$minus_2) = (\$M[rand @M], \$M[rand @M]);
my \$str = \$minus_1 . \$minus_2;

for (1 .. \$N - 2) {
my \$idx = rand @M;
\$idx = (\$idx + (int(rand \$#M) + 1)) % @M if \$minus_1 eq \$minus_2 &
+& \$M[\$idx] eq \$minus_2;
\$str .= \$M[\$idx];
(\$minus_1, \$minus_2) = (\$minus_2, \$M[\$idx]);
}
print "\$str\n";
Update: Now after reading the solutions, I realize mine is almost identical to ikegami's. I chose to use scalars to hold the value of the last two chars instead of the regex, but very similar indeed.

Update 2: I cleaned up the code a bit though the algorithm stayed the same. I also fixed the bugs as shown below.

Cheers - L~R

Unless I've broken something by subroutinising it, there is a flaw in your algorithm:

```#!perl -slw
use strict;
use List::Util qw/ shuffle /;

sub x {
my( \$N, @M ) = @_;
my (\$minus_1, \$minus_2, \$str) = (('') x 3);

\$minus_1 = \$M[rand @M];
\$minus_2 = \$M[rand @M];

\$str = \$minus_1 . \$minus_2;

for (1 .. \$N - 2) {
my \$next_char;
my \$idx = int(rand @M);
if (\$minus_1 eq \$minus_2) {
\$idx = (\$idx + int(rand @M) + 1) % @M;
}
\$str .= \$next_char = \$M[\$idx];
(\$minus_1, \$minus_2) = (\$minus_2, \$next_char);
}
return \$str;
}

for( 1 .. 10 ) {
my \$x = x( 12, qw[ A B C D E F ] );
print \$x, \$x =~ m[(.)\1\1] ? ' Fail' : ' Ok';
}

__END__
[17:39:11.61] c:\test>junk3
CECBAABFCEFE Ok
CDDCFDAABEBF Ok
DAABAAFDCCCE Fail
CEDCDDFCEEFA Ok
FCBECCCEAFEB Fail
BrowserUk,
Yes, there were at least 2 bugs. Try this instead.
```#!perl -slw
use strict;
use List::Util qw/ shuffle /;

sub x {
my( \$N, @M ) = @_;
my (\$minus_1, \$minus_2) = (\$M[rand @M], \$M[rand @M]);
my \$str = \$minus_1 . \$minus_2;

for (1 .. \$N - 2) {
my \$idx = rand @M;
\$idx = (\$idx + (int(rand \$#M) + 1)) % @M if \$minus_1 eq \$minus
+_2 && \$M[\$idx] eq \$minus_2;
\$str .= \$M[\$idx];
(\$minus_1, \$minus_2) = (\$minus_2, \$M[\$idx]);
}
return \$str;
}

for( 1 .. 100 ) {
my \$x = x( 12, qw[ A B C D E F ] );
print \$x, \$x =~ m[(.)\1\1] ? ' Fail' : ' Ok';
}

Cheers - L~R

Re: Random data generation.
by Khen1950fx (Canon) on Jun 27, 2010 at 12:03 UTC
I tried a variation on a card shuffle.
```#!/usr/bin/perl

use strict;
use warnings;

my @random = (qw/a b c d e f/) x 2;
foreach my \$i (reverse 0..\$#random) {
my \$r = int rand (\$i+1);
@random[\$i, \$r] = @random[\$r, \$i] unless (\$i == \$r);
}
print @random, "\n";

This suffers the same limitation as jwkrahn (that initially had me fooled), in that:

1. it can only deal with strings up to twice the length of the input set;

Whilst I gave 2 & 12 as an examples of M & N, I thought it was clear that other values were possible.

2. it will only generate a small fraction of the possible legal values;

Even at length 12, it won't generate abbabbabbabb.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Random data generation.
by Sewi (Friar) on Jun 28, 2010 at 08:10 UTC
```my @chars = (A..F);
my @strings;
while (\$#strings < 5) {
my \$random = join('',map { \$chars[int(rand(scalar(@chars)))]; } (1..1
+2));
\$random =~ /(.)\1{2}/ and next;
push @strings, \$random;
}

Generating the entire string before you check becomes highly inefficient for long strings, because the likelihood to have a violation approaches 1.0 as strings get longer, so you'll throw away almost every sample you generated...

Re: Random data generation.
by davies (Parson) on Jun 29, 2010 at 12:18 UTC
Another go, seeing that my previous attempt didn't do what you wanted. I don't see anyone else trying this algorithm, but it may have been done by someone who can write better Perl than I can.
```use strict;
use warnings;

my \$nRepeats = 2;
my @sSet = qw (a b c d e f);
my \$nLength = 12;

my %sSets;
for my \$nKey (0..((@sSet) ** \$nRepeats - 1)) {
my \$sKey = "";
for (1..\$nRepeats) {
\$sKey .= \$sSet[(\$nKey / ((\$#sSet + 1) ** (\$_ - 1))) % (\$#sSet
++ 1)];
}
\$sSets{\$sKey} = [@sSet];
}

for (0..\$#sSet) {
splice(@{\$sSets{\$sSet[\$_] x \$nRepeats}}, \$_, 1);
}

my \$sString;
my @sChr;
for (1..\$nRepeats) {
my \$sRnd = \$sSet[int(rand(@sSet))];
\$sString .= \$sRnd;
push(@sChr, \$sRnd);
}

for (\$nRepeats + 1 .. \$nLength) {
my \$sKey;
for (@sChr) {
\$sKey .= \$_;
}
my \$sRnd = @{\$sSets{\$sKey}}[int(rand(@{\$sSets{\$sKey}}))];
\$sString .= \$sRnd;
push(@sChr, \$sRnd);
shift(@sChr);
}

print "\$sString \n";
The algorithm was developed on one of the hot nights when you couldn't sleep, either. The Perl has taken me 2 days to write with all the other distractions life affords. I learned some useful stuff writing it (thanks, Tye, for helping me in CB), so I'm putting it up even though you have your solution!

Regards,

John Davies

Create A New User
Node Status?
node history
Node Type: perlquestion [id://846630]
Approved by ikegami
Front-paged by toolic
help
Chatterbox?
 [Corion]: A good daystart everybody!

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2018-01-23 08:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
How did you see in the new year?

Results (241 votes). Check out past polls.

Notices?