Re: Random data generation.
by ikegami (Patriarch) on Jun 26, 2010 at 02:37 UTC
|
This can be solved simply by viewing the set as a circular list. Just pick one of the next N-1 characters.
# No two repeated characters
my @set = qw( a b c d e f );
my $len = 12;
my $s = '';
my $i = int(rand(@set));
for (1..$len) {
$s .= $set[$i];
$i = ( $i + int(rand($#set)) + 1 ) % @set;
}
print("$s\n");
dbadfbabfbcd
deabfadaeafb
edbaceabacad
Update: Fixed bugs, then I realized I misread the question. On the plus side, the solution can easily be adapted to solve the actual problem.
# No more than two repeated characters
my @set = qw( a b c d e f );
my $len = 12;
my $s = '';
my $i;
for (1..$len) {
if ($s =~ /(.)\1\z/) {
$i = ( $i + int(rand($#set)) + 1 ) % @set;
} else {
$i = int(rand(@set));
}
$s .= $set[$i];
}
print("$s\n");
aadbadfafbfb
dbfaefababea
efeabbcebbee
| [reply] [d/l] [select] |
Re: Random data generation.
by jwkrahn (Abbot) on Jun 26, 2010 at 12:41 UTC
|
$ perl -le'
use List::Util qw/ shuffle /;
my @chars = qw/ A B C D E F /;
my $length = 12;
my $repeats = 2;
my @set = shuffle( ( @chars ) x $repeats );
die "Error!\n" if @set < $length;
print join "", @set[ 0 .. $length - 1 ];
'
FCBECFADABED
| [reply] [d/l] |
|
| [reply] |
|
As there are only two of each character in the set which is shuffled ((@chars) x 2), there can of course also only occur two in a row at maximum.
OTOH, this technique doesn't allow more than two of any character in the result (like AACBAFFDCAEC), so it might introduce an unwanted bias... (you didn't explicitly say whether this is desired or not, and it cannot be inferred from your valid samples).
| [reply] [d/l] [select] |
|
|
|
|
It's not random. Not even close. For example, you'll never have three of the same character anywhere in the string.
I don't know if it matters, but it only works if $len <= @set*2
| [reply] [d/l] |
Re: Random data generation.
by AnomalousMonk (Archbishop) on Jun 26, 2010 at 05:57 UTC
|
ikegami's is a general solution for all cases, but would it be sufficient in non-extreme cases (i.e., not trying to generate 100-character strings from just two different characters and with no more than two sequential repeats) to just generate bunches of strings and throw away all that are non-compliant?
>perl -wMstrict -le
"use List::Util qw(shuffle);
my @set = qw(a b c d e f);
my $len = 12;
my $needed = shift;
my @out;
while (@out < $needed) {
my $str = join '', shuffle((@set) x $len);
push @out, grep !m{ (.)\1\1 }xms, $str =~ m{ .{$len} }xmsg;
}
@out = @out[-$needed .. -1];
m{ (.)\1\1 }xms and die qq{3+ same: '$_'} for @out;
printf qq{%3d: '$out[$_]' \n}, 1+$_ for 0 .. $#out;
" 5
1: 'becaecdbbedc'
2: 'afccbecefdca'
3: 'fbffecacfeaa'
4: 'ddbacddfafca'
5: 'fbebdabadedc'
| [reply] [d/l] |
|
You have it backwards. The smaller the set and the longer the string, the higher the chance of producing something that needs to be thrown away. What you call the non-extreme case is where your algorithm has the most problems.
The real problem is that your solution isn't random. The chance of picking a characters at a certain position is affected by the characters picked in previous positions.
| [reply] |
|
The smaller the set and the longer the string, the higher the chance of producing something that needs to be thrown away.
I thought that that, in essence, was implied by what I wrote.
What you call the non-extreme case is where your algorithm has the most problems.
I take 'problems' to mean the generation of strings that don't meet the max-repeated-characters requirement and so must be thrown away.
So, if my algorithm was trying to generate strings of length 3 from a character set of 100 characters (my idea of a very 'non-extreme' case), which in the totally random case (OK, OK, my algorithm is only approximately random – but see below) would result about 1 in a million times in a string of three identical characters that needed to be discarded, this would be more problematic than trying to generate strings of, say, length 100 from a set of three characters? I don't understand.
The real problem is that your solution isn't random.
I agree it isn't completely random, but, as others have noted, it wasn't clear from the OP and subsequent discussion that complete randomness was required. I was aiming for something quick and dirty that would satisfy the max-repeated-characters requirement while still being sort of random-ish.
| [reply] |
|
|
| [reply] |
Re: Random data generation.
by bluescreen (Friar) on Jun 26, 2010 at 02:59 UTC
|
#!/usr/bin/perl
use strict;
my @set = ('A', 'B', 'C', 'D', 'E', 'F');
my $length = 12;
my $result;
my $last_char;
while(length($result) < $length) {
my $char = $set[int(rand()*(scalar(@set)-1))];
next if ( $last_char eq $char);
$result .= $char;
$last_char = $char;
}
print $result;
| [reply] [d/l] |
|
my @set = qw( A B C D E F );
my $len = 12;
my $result = '';
while (length($result) < $len) {
my $char = $set[ rand @set ];
next if substr($result, -2) eq $char x 2;
$result .= $char;
}
print $result;
Also, there's no need to use an extra variable (like $last_char), as $result already stores all the previous characters. | [reply] [d/l] [select] |
|
Absolutely it was a quick-&-dirty script, probably to much :D, I knew that instead of $last_char I could have used substr(), but premature optimization made me do that.
Anyway this sort of social coding brought an interesting result
| [reply] |
|
You made the same error I did originally, you attempted to solve "no repeating characters" rather than "no more than two repeating characters".
Furthermore, your code never uses the last character of the set.
| [reply] |
|
| [reply] |
Re: Random data generation.
by salva (Canon) on Jun 26, 2010 at 15:40 UTC
|
my @set = split //, "ABCDEF";
my $len = 12;
my $max_reps = 2;
say gen(\@set, $len, $max_reps) for (1..20);
sub gen {
my ($set, $len, $max_reps) = @_;
my $out = '';
my $last_ix = -1;
my $reps = 0;
for (1..$len) {
my $ix;
if ($reps >= $max_reps) {
$ix = int rand(@$set - 1);
$ix++ if $ix >= $last_ix;
}
else {
$ix = int rand(@$set);
}
if ($ix == $last_ix) {
$reps++;
}
else {
$last_ix = $ix;
$reps = 1;
}
$out .= $set->[$ix];
}
$out;
}
| [reply] [d/l] |
|
| [reply] |
Re: Random data generation.
by johngg (Canon) on Jun 26, 2010 at 23:12 UTC
|
This seems to work, if I've understood correctly.
use strict;
use warnings;
use 5.010;
say genString( 12, ( q{a} .. q{f} ) );
say genString( 20, ( q{a} .. q{c} ) );
say genString( 20, ( q{a} .. q{b} ) );
sub genString
{
my( $len, @set ) = @_;
my $string = q{};
for ( 1 .. $len )
{
if ( $string =~ m{(.)\1$} )
{
my $let;
do { $let = $set[ int rand @set ] } until $let ne $1;
$string .= $let;
}
else
{
$string .= $set[ int rand @set ];
}
}
return $string;
}
The output.
bceffcedefda
bccbbcaabbccacabbcaa
abaabbaabaabababbabb
You would need to add a test to prevent an endless loop if given a set with only one member.
I hope this is of interest.
| [reply] [d/l] [select] |
|
A working solution, but like others that invoke the regex engine, it proves slower, especially for longer strings, than a flags & conditions solution. (Eg. salva's 846681).
[16:16:52.15] c:\test>junk -I=3 ABCDEF 12
Rate x0 x3 x8 x4 x6 x5 x7
x0 9284/s -- -7% -16% -33% -38% -38% -47%
x3 9946/s 7% -- -10% -29% -33% -34% -44% ikegami
x8 11091/s 19% 12% -- -20% -26% -26% -37% johngg
x4 13951/s 50% 40% 26% -- -7% -7% -21% ikegami variant
x6 14952/s 61% 50% 35% 7% -- -0% -15%
x5 14964/s 61% 50% 35% 7% 0% -- -15%
x7 17647/s 90% 77% 59% 26% 18% 18% -- salva
[16:17:13.48] c:\test>junk -I=3 ABCDEF 120
Rate x0 x3 x4 x8 x6 x5 x7
x0 279/s -- -8% -10% -78% -83% -83% -91%
x3 302/s 8% -- -3% -76% -81% -81% -90%
x4 310/s 11% 3% -- -75% -81% -81% -89%
x8 1250/s 348% 314% 303% -- -22% -22% -58%
x6 1595/s 472% 428% 414% 28% -- -0% -46%
x5 1597/s 472% 429% 415% 28% 0% -- -46%
x7 2951/s 958% 877% 851% 136% 85% 85% --
[16:17:30.86] c:\test>junk -I=3 ABCDEF 1200
Rate x0 x4 x3 x8 x6 x5 x7
x0 3.38/s -- -1% -3% -97% -98% -98% -99%
x4 3.42/s 1% -- -1% -97% -98% -98% -99%
x3 3.46/s 3% 1% -- -97% -98% -98% -99%
x8 121/s 3478% 3433% 3388% -- -24% -24% -63%
x6 159/s 4608% 4548% 4489% 32% -- -1% -51%
x5 160/s 4633% 4573% 4513% 32% 1% -- -51%
x7 324/s 9499% 9377% 9256% 168% 104% 103% --
As you can see. salva's solution wins easily on performance. | [reply] [d/l] |
|
I wonder if you've overlooked my solution, or if
there's anything wrong with it.
Not only is it simpler (both less
lines of code and conceptually easier to understand, IMHO), it's also consistently
faster than what you found to be the fastest:
$ ./846784.pl
Rate salva almut
salva 881/s -- -15%
almut 1037/s 18% --
(tested with v5.10.1, x86_64-linux-thread-multi)
| [reply] [d/l] [select] |
|
|
|
Re: Random data generation.
by Khen1950fx (Canon) on Jun 26, 2010 at 04:01 UTC
|
#!/usr/bin/perl
use strict;
use warnings;
use Test::LectroTest::Generator
qw(:common :combinators);
my $str_gen = String( charset => "a-f", length => [12,] );
for my $string_guidance (1) {
my $i = $str_gen->generate($string_guidance);
print "$i ";
}
print "\n";
Update: Trying to get a work-around for the constraint, I kept coming across
warnings that the length cannot exceed the number of chars in the set. If I do
length => [6,], then the constraint is automatic, and it works. | [reply] [d/l] [select] |
|
| [reply] |
Re: Random data generation.
by davies (Prior) on Jun 26, 2010 at 16:45 UTC
|
This is actually a classic algorithm used for things like dealing a pack of cards. If you are dealing bridge hands, for example, you want 52 different cards with no repeats. Some games use multiple packs and/or cut down packs (Bézique uses a pack of 2 decks, A down to 7). You therefore set up a pack with all the possible cards, deal one at random, move the last card to the position from which the card was taken & reduce the pack by one. So, for your problem: use strict;
use warnings;
my $nRepeats = 2;
my @sSet = qw (a b c d e f);
my $nLength = 12;
my @sAll;
for (1..$nRepeats) {
push (@sAll, @sSet);
}
if ($nLength - 1 > $#sAll) {die "Don't be silly"}
my $sString;
for (1..$nLength) {
my $i = int(rand(@sAll));
$sString .= $sAll[$i];
$sAll[$i] = $sAll[-1];
pop(@sAll);
}
print "$sString \n";
Now, I'm a beginner. So if anyone sees anything that could be improved (apart from Hungarian notation - it helps me conceptualise strong typing), I'd love to know.
Regards,
John Davies
Update: Re-reading your question, if AABAABAABAAB is valid, my code won't do what you want, but I'm not clear on whether that's what you mean. | [reply] [d/l] |
Re: Random data generation.
by Limbic~Region (Chancellor) on Jun 27, 2010 at 16:30 UTC
|
BrowserUk,
I haven't read any of the other solutions but this is the first idea that popped into my head. I haven't really tested it nor is it without at least 1 assumption.
#!/usr/bin/perl
use strict;
use warnings;
my @M = 'a' .. 'z';
my $N = 42; # Assumes N is > 2
my ($minus_1, $minus_2) = ($M[rand @M], $M[rand @M]);
my $str = $minus_1 . $minus_2;
for (1 .. $N - 2) {
my $idx = rand @M;
$idx = ($idx + (int(rand $#M) + 1)) % @M if $minus_1 eq $minus_2 &
+& $M[$idx] eq $minus_2;
$str .= $M[$idx];
($minus_1, $minus_2) = ($minus_2, $M[$idx]);
}
print "$str\n";
Update: Now after reading the solutions, I realize mine is almost identical to ikegami's. I chose to use scalars to hold the value of the last two chars instead of the regex, but very similar indeed.
Update 2: I cleaned up the code a bit though the algorithm stayed the same. I also fixed the bugs as shown below.
| [reply] [d/l] |
|
#!perl -slw
use strict;
use List::Util qw/ shuffle /;
sub x {
my( $N, @M ) = @_;
my ($minus_1, $minus_2, $str) = (('') x 3);
$minus_1 = $M[rand @M];
$minus_2 = $M[rand @M];
$str = $minus_1 . $minus_2;
for (1 .. $N - 2) {
my $next_char;
my $idx = int(rand @M);
if ($minus_1 eq $minus_2) {
$idx = ($idx + int(rand @M) + 1) % @M;
}
$str .= $next_char = $M[$idx];
($minus_1, $minus_2) = ($minus_2, $next_char);
}
return $str;
}
for( 1 .. 10 ) {
my $x = x( 12, qw[ A B C D E F ] );
print $x, $x =~ m[(.)\1\1] ? ' Fail' : ' Ok';
}
__END__
[17:39:11.61] c:\test>junk3
BAADDBBCADEF Ok
ADAEFADBCEEA Ok
CECBAABFCEFE Ok
CDDCFDAABEBF Ok
ADFBFCEADFFF Fail
DAABAAFDCCCE Fail
ABAADCBDFBBB Fail
FDDDBEEADBCC Fail
CEDCDDFCEEFA Ok
FCBECCCEAFEB Fail
| [reply] [d/l] |
|
BrowserUk,
Yes, there were at least 2 bugs. Try this instead.
#!perl -slw
use strict;
use List::Util qw/ shuffle /;
sub x {
my( $N, @M ) = @_;
my ($minus_1, $minus_2) = ($M[rand @M], $M[rand @M]);
my $str = $minus_1 . $minus_2;
for (1 .. $N - 2) {
my $idx = rand @M;
$idx = ($idx + (int(rand $#M) + 1)) % @M if $minus_1 eq $minus
+_2 && $M[$idx] eq $minus_2;
$str .= $M[$idx];
($minus_1, $minus_2) = ($minus_2, $M[$idx]);
}
return $str;
}
for( 1 .. 100 ) {
my $x = x( 12, qw[ A B C D E F ] );
print $x, $x =~ m[(.)\1\1] ? ' Fail' : ' Ok';
}
| [reply] [d/l] |
|
Re: Random data generation.
by Khen1950fx (Canon) on Jun 27, 2010 at 12:03 UTC
|
I tried a variation on a card shuffle.
#!/usr/bin/perl
use strict;
use warnings;
my @random = (qw/a b c d e f/) x 2;
foreach my $i (reverse 0..$#random) {
my $r = int rand ($i+1);
@random[$i, $r] = @random[$r, $i] unless ($i == $r);
}
print @random, "\n";
| [reply] [d/l] |
|
| [reply] [d/l] |
Re: Random data generation.
by Sewi (Friar) on Jun 28, 2010 at 08:10 UTC
|
my @chars = (A..F);
my @strings;
while ($#strings < 5) {
my $random = join('',map { $chars[int(rand(scalar(@chars)))]; } (1..1
+2));
$random =~ /(.)\1{2}/ and next;
push @strings, $random;
}
| [reply] [d/l] |
|
Generating the entire string before you check becomes highly inefficient for long strings, because the likelihood to have a violation approaches 1.0 as strings get longer, so you'll throw away almost every sample you generated...
| [reply] |
Re: Random data generation.
by davies (Prior) on Jun 29, 2010 at 12:18 UTC
|
Another go, seeing that my previous attempt didn't do what you wanted. I don't see anyone else trying this algorithm, but it may have been done by someone who can write better Perl than I can.
use strict;
use warnings;
my $nRepeats = 2;
my @sSet = qw (a b c d e f);
my $nLength = 12;
my %sSets;
for my $nKey (0..((@sSet) ** $nRepeats - 1)) {
my $sKey = "";
for (1..$nRepeats) {
$sKey .= $sSet[($nKey / (($#sSet + 1) ** ($_ - 1))) % ($#sSet
++ 1)];
}
$sSets{$sKey} = [@sSet];
}
for (0..$#sSet) {
splice(@{$sSets{$sSet[$_] x $nRepeats}}, $_, 1);
}
my $sString;
my @sChr;
for (1..$nRepeats) {
my $sRnd = $sSet[int(rand(@sSet))];
$sString .= $sRnd;
push(@sChr, $sRnd);
}
for ($nRepeats + 1 .. $nLength) {
my $sKey;
for (@sChr) {
$sKey .= $_;
}
my $sRnd = @{$sSets{$sKey}}[int(rand(@{$sSets{$sKey}}))];
$sString .= $sRnd;
push(@sChr, $sRnd);
shift(@sChr);
}
print "$sString \n";
The algorithm was developed on one of the hot nights when you couldn't sleep, either. The Perl has taken me 2 days to write with all the other distractions life affords. I learned some useful stuff writing it (thanks, Tye, for helping me in CB), so I'm putting it up even though you have your solution!
Regards,
John Davies | [reply] [d/l] |