Random data generation.

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Random data generation. by ikegami (Patriarch) on Jun 26, 2010 at 02:37 UTC
This can be solved simply by viewing the set as a circular list. Just pick one of the next N-1 characters. `# No two repeated characters my @set = qw( a b c d e f ); my $len = 12; my $s = ''; my $i = int(rand(@set)); for (1..$len) { $s .= $set[$i]; $i = ( $i + int(rand($#set)) + 1 ) % @set; } print("$s\n");` [download] `dbadfbabfbcd deabfadaeafb edbaceabacad` [download] Update: Fixed bugs, then I realized I misread the question. On the plus side, the solution can easily be adapted to solve the actual problem. `# No more than two repeated characters my @set = qw( a b c d e f ); my $len = 12; my $s = ''; my $i; for (1..$len) { if ($s =~ /(.)\1\z/) { $i = ( $i + int(rand($#set)) + 1 ) % @set; } else { $i = int(rand(@set)); } $s .= $set[$i]; } print("$s\n");` [download] `aadbadfafbfb dbfaefababea efeabbcebbee` [download]	[reply] [d/l] [select]
Re: Random data generation. by jwkrahn (Abbot) on Jun 26, 2010 at 12:41 UTC
`$ perl -le' use List::Util qw/ shuffle /; my @chars = qw/ A B C D E F /; my $length = 12; my $repeats = 2; my @set = shuffle( ( @chars ) x $repeats ); die "Error!\n" if @set < $length; print join "", @set[ 0 .. $length - 1 ]; ' FCBECFADABED` [download]	[reply] [d/l]
Re^2: Random data generation. by BrowserUk (Patriarch) on Jun 26, 2010 at 13:20 UTC
That's the one. For the life of me I don't see how it works yet, but it does. Many thanks.	[reply]
Re^3: Random data generation. by almut (Canon) on Jun 26, 2010 at 13:39 UTC
As there are only two of each character in the set which is shuffled (`(@chars) x 2`), there can of course also only occur two in a row at maximum. OTOH, this technique doesn't allow more than two of any character in the result (like `AACBAFFDCAEC`), so it might introduce an unwanted bias... (you didn't explicitly say whether this is desired or not, and it cannot be inferred from your valid samples).	[reply] [d/l] [select]
Re^4: Random data generation. by BrowserUk (Patriarch) on Jun 26, 2010 at 18:51 UTC
Re^5: Random data generation. by ikegami (Patriarch) on Jun 27, 2010 at 06:22 UTC
Some notes below your chosen depth have not been shown here
Re^3: Random data generation. by ikegami (Patriarch) on Jun 26, 2010 at 15:39 UTC
It's not random. Not even close. For example, you'll never have three of the same character anywhere in the string. I don't know if it matters, but it only works if `$len <= @set*2`	[reply] [d/l]
Re: Random data generation. by AnomalousMonk (Archbishop) on Jun 26, 2010 at 05:57 UTC
ikegami's is a general solution for all cases, but would it be sufficient in non-extreme cases (i.e., not trying to generate 100-character strings from just two different characters and with no more than two sequential repeats) to just generate bunches of strings and throw away all that are non-compliant? `>perl -wMstrict -le "use List::Util qw(shuffle); my @set = qw(a b c d e f); my $len = 12; my $needed = shift; my @out; while (@out < $needed) { my $str = join '', shuffle((@set) x $len); push @out, grep !m{ (.)\1\1 }xms, $str =~ m{ .{$len} }xmsg; } @out = @out[-$needed .. -1]; m{ (.)\1\1 }xms and die qq{3+ same: '$_'} for @out; printf qq{%3d: '$out[$_]' \n}, 1+$_ for 0 .. $#out; " 5 1: 'becaecdbbedc' 2: 'afccbecefdca' 3: 'fbffecacfeaa' 4: 'ddbacddfafca' 5: 'fbebdabadedc'` [download]	[reply] [d/l]
Re^2: Random data generation. by ikegami (Patriarch) on Jun 26, 2010 at 15:33 UTC
You have it backwards. The smaller the set and the longer the string, the higher the chance of producing something that needs to be thrown away. What you call the non-extreme case is where your algorithm has the most problems. The real problem is that your solution isn't random. The chance of picking a characters at a certain position is affected by the characters picked in previous positions.	[reply]
Re^3: Random data generation. by AnomalousMonk (Archbishop) on Jun 26, 2010 at 20:06 UTC
The smaller the set and the longer the string, the higher the chance of producing something that needs to be thrown away. I thought that that, in essence, was implied by what I wrote. What you call the non-extreme case is where your algorithm has the most problems. I take 'problems' to mean the generation of strings that don't meet the max-repeated-characters requirement and so must be thrown away. So, if my algorithm was trying to generate strings of length 3 from a character set of 100 characters (my idea of a very 'non-extreme' case), which in the totally random case (OK, OK, my algorithm is only approximately random – but see below) would result about 1 in a million times in a string of three identical characters that needed to be discarded, this would be more problematic than trying to generate strings of, say, length 100 from a set of three characters? I don't understand. The real problem is that your solution isn't random. I agree it isn't completely random, but, as others have noted, it wasn't clear from the OP and subsequent discussion that complete randomness was required. I was aiming for something quick and dirty that would satisfy the max-repeated-characters requirement while still being sort of random-ish.	[reply]
Re^4: Random data generation. by ikegami (Patriarch) on Jun 27, 2010 at 06:16 UTC
Re^2: Random data generation. by BrowserUk (Patriarch) on Jun 28, 2010 at 12:12 UTC
See salva's second approach Re^5: Random data generation. for a couple of innovations that make this approach viable.	[reply]
Re: Random data generation. by bluescreen (Friar) on Jun 26, 2010 at 02:59 UTC
Is this what you looking for? `#!/usr/bin/perl use strict; my @set = ('A', 'B', 'C', 'D', 'E', 'F'); my $length = 12; my $result; my $last_char; while(length($result) < $length) { my $char = $set[int(rand()*(scalar(@set)-1))]; next if ( $last_char eq $char); $result .= $char; $last_char = $char; } print $result;` [download]	[reply] [d/l]
Re^2: Random data generation. by almut (Canon) on Jun 26, 2010 at 07:10 UTC
Your approach could be made to work by simply checking the last two characters: `my @set = qw( A B C D E F ); my $len = 12; my $result = ''; while (length($result) < $len) { my $char = $set[ rand @set ]; next if substr($result, -2) eq $char x 2; $result .= $char; } print $result;` [download] Also, there's no need to use an extra variable (like `$last_char`), as `$result` already stores all the previous characters.	[reply] [d/l] [select]
Re^3: Random data generation. by bluescreen (Friar) on Jun 26, 2010 at 18:07 UTC
Absolutely it was a quick-&-dirty script, probably to much :D, I knew that instead of $last_char I could have used substr(), but premature optimization made me do that. Anyway this sort of social coding brought an interesting result	[reply]
Re^2: Random data generation. by ikegami (Patriarch) on Jun 26, 2010 at 03:35 UTC
You made the same error I did originally, you attempted to solve "no repeating characters" rather than "no more than two repeating characters". Furthermore, your code never uses the last character of the set.	[reply]
Re^2: Random data generation. by BrowserUk (Patriarch) on Jun 26, 2010 at 03:29 UTC
Not quite. I have to allow two consecutive chars, but not tripled or more...	[reply]
Re: Random data generation. by salva (Canon) on Jun 26, 2010 at 15:40 UTC
`my @set = split //, "ABCDEF"; my $len = 12; my $max_reps = 2; say gen(\@set, $len, $max_reps) for (1..20); sub gen { my ($set, $len, $max_reps) = @_; my $out = ''; my $last_ix = -1; my $reps = 0; for (1..$len) { my $ix; if ($reps >= $max_reps) { $ix = int rand(@$set - 1); $ix++ if $ix >= $last_ix; } else { $ix = int rand(@$set); } if ($ix == $last_ix) { $reps++; } else { $last_ix = $ix; $reps = 1; } $out .= $set->[$ix]; } $out; }` [download]	[reply] [d/l]
Re^2: Random data generation. by BrowserUk (Patriarch) on Jun 26, 2010 at 18:53 UTC
Yep. That's pretty much exactly what I came up with for my first attempt. Though I think I needed one more flag.	[reply]
Re: Random data generation. by johngg (Canon) on Jun 26, 2010 at 23:12 UTC
This seems to work, if I've understood correctly. `use strict; use warnings; use 5.010; say genString( 12, ( q{a} .. q{f} ) ); say genString( 20, ( q{a} .. q{c} ) ); say genString( 20, ( q{a} .. q{b} ) ); sub genString { my( $len, @set ) = @_; my $string = q{}; for ( 1 .. $len ) { if ( $string =~ m{(.)\1$} ) { my $let; do { $let = $set[ int rand @set ] } until $let ne $1; $string .= $let; } else { $string .= $set[ int rand @set ]; } } return $string; }` [download] The output. `bceffcedefda bccbbcaabbccacabbcaa abaabbaabaabababbabb` [download] You would need to add a test to prevent an endless loop if given a set with only one member. I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]
Re^2: Random data generation. by BrowserUk (Patriarch) on Jun 27, 2010 at 15:21 UTC
A working solution, but like others that invoke the regex engine, it proves slower, especially for longer strings, than a flags & conditions solution. (Eg. salva's 846681). [16:16:52.15] c:\test>junk -I=3 ABCDEF 12 Rate x0 x3 x8 x4 x6 x5 x7 x0 9284/s -- -7% -16% -33% -38% -38% -47% x3 9946/s 7% -- -10% -29% -33% -34% -44% ikegami x8 11091/s 19% 12% -- -20% -26% -26% -37% johngg x4 13951/s 50% 40% 26% -- -7% -7% -21% ikegami variant x6 14952/s 61% 50% 35% 7% -- -0% -15% x5 14964/s 61% 50% 35% 7% 0% -- -15% x7 17647/s 90% 77% 59% 26% 18% 18% -- salva [16:17:13.48] c:\test>junk -I=3 ABCDEF 120 Rate x0 x3 x4 x8 x6 x5 x7 x0 279/s -- -8% -10% -78% -83% -83% -91% x3 302/s 8% -- -3% -76% -81% -81% -90% x4 310/s 11% 3% -- -75% -81% -81% -89% x8 1250/s 348% 314% 303% -- -22% -22% -58% x6 1595/s 472% 428% 414% 28% -- -0% -46% x5 1597/s 472% 429% 415% 28% 0% -- -46% x7 2951/s 958% 877% 851% 136% 85% 85% -- [16:17:30.86] c:\test>junk -I=3 ABCDEF 1200 Rate x0 x4 x3 x8 x6 x5 x7 x0 3.38/s -- -1% -3% -97% -98% -98% -99% x4 3.42/s 1% -- -1% -97% -98% -98% -99% x3 3.46/s 3% 1% -- -97% -98% -98% -99% x8 121/s 3478% 3433% 3388% -- -24% -24% -63% x6 159/s 4608% 4548% 4489% 32% -- -1% -51% x5 160/s 4633% 4573% 4513% 32% 1% -- -51% x7 324/s 9499% 9377% 9256% 168% 104% 103% -- [download] As you can see. salva's solution wins easily on performance.	[reply] [d/l]
Re^3: Random data generation. by almut (Canon) on Jun 27, 2010 at 18:27 UTC
I wonder if you've overlooked my solution, or if there's anything wrong with it. Not only is it simpler (both less lines of code and conceptually easier to understand, IMHO), it's also consistently faster than what you found to be the fastest: `$ ./846784.pl Rate salva almut salva 881/s -- -15% almut 1037/s 18% --` [download] (tested with v5.10.1, x86_64-linux-thread-multi) Read more... benchmark (1315 Bytes)	[reply] [d/l] [select]
Re^4: Random data generation. by BrowserUk (Patriarch) on Jun 27, 2010 at 18:59 UTC
Re^5: Random data generation. by salva (Canon) on Jun 28, 2010 at 10:16 UTC
Some notes below your chosen depth have not been shown here
Re: Random data generation. by Khen1950fx (Canon) on Jun 26, 2010 at 04:01 UTC
I tried it with Test::LectroTest::Generator. `#!/usr/bin/perl use strict; use warnings; use Test::LectroTest::Generator qw(:common :combinators); my $str_gen = String( charset => "a-f", length => [12,] ); for my $string_guidance (1) { my $i = $str_gen->generate($string_guidance); print "$i "; } print "\n";` [download] Update: Trying to get a work-around for the constraint, I kept coming across warnings that the length cannot exceed the number of chars in the set. If I do `length => [6,]`, then the constraint is automatic, and it works.	[reply] [d/l] [select]
Re^2: Random data generation. by BrowserUk (Patriarch) on Jun 26, 2010 at 04:19 UTC
Is that in any way constrained to having no more than two consecutive repeats?	[reply]
Re: Random data generation. by davies (Prior) on Jun 26, 2010 at 16:45 UTC
This is actually a classic algorithm used for things like dealing a pack of cards. If you are dealing bridge hands, for example, you want 52 different cards with no repeats. Some games use multiple packs and/or cut down packs (Bézique uses a pack of 2 decks, A down to 7). You therefore set up a pack with all the possible cards, deal one at random, move the last card to the position from which the card was taken & reduce the pack by one. So, for your problem: `use strict; use warnings; my $nRepeats = 2; my @sSet = qw (a b c d e f); my $nLength = 12; my @sAll; for (1..$nRepeats) { push (@sAll, @sSet); } if ($nLength - 1 > $#sAll) {die "Don't be silly"} my $sString; for (1..$nLength) { my $i = int(rand(@sAll)); $sString .= $sAll[$i]; $sAll[$i] = $sAll[-1]; pop(@sAll); } print "$sString \n";` [download] Now, I'm a beginner. So if anyone sees anything that could be improved (apart from Hungarian notation - it helps me conceptualise strong typing), I'd love to know. Regards, John Davies Update: Re-reading your question, if AABAABAABAAB is valid, my code won't do what you want, but I'm not clear on whether that's what you mean.	[reply] [d/l]
Re: Random data generation. by Limbic~Region (Chancellor) on Jun 27, 2010 at 16:30 UTC
BrowserUk, I haven't read any of the other solutions but this is the first idea that popped into my head. I haven't really tested it nor is it without at least 1 assumption. `#!/usr/bin/perl use strict; use warnings; my @M = 'a' .. 'z'; my $N = 42; # Assumes N is > 2 my ($minus_1, $minus_2) = ($M[rand @M], $M[rand @M]); my $str = $minus_1 . $minus_2; for (1 .. $N - 2) { my $idx = rand @M; $idx = ($idx + (int(rand $#M) + 1)) % @M if $minus_1 eq $minus_2 & +& $M[$idx] eq $minus_2; $str .= $M[$idx]; ($minus_1, $minus_2) = ($minus_2, $M[$idx]); } print "$str\n";` [download] Update: Now after reading the solutions, I realize mine is almost identical to ikegami's. I chose to use scalars to hold the value of the last two chars instead of the regex, but very similar indeed. Update 2: I cleaned up the code a bit ~~though the algorithm stayed the same~~. I also fixed the bugs as shown below. Cheers - L~R	[reply] [d/l]
Re^2: Random data generation. by BrowserUk (Patriarch) on Jun 27, 2010 at 16:40 UTC
Unless I've broken something by subroutinising it, there is a flaw in your algorithm: #!perl -slw use strict; use List::Util qw/ shuffle /; sub x { my( $N, @M ) = @_; my ($minus_1, $minus_2, $str) = (('') x 3); $minus_1 = $M[rand @M]; $minus_2 = $M[rand @M]; $str = $minus_1 . $minus_2; for (1 .. $N - 2) { my $next_char; my $idx = int(rand @M); if ($minus_1 eq $minus_2) { $idx = ($idx + int(rand @M) + 1) % @M; } $str .= $next_char = $M[$idx]; ($minus_1, $minus_2) = ($minus_2, $next_char); } return $str; } for( 1 .. 10 ) { my $x = x( 12, qw[ A B C D E F ] ); print $x, $x =~ m[(.)\1\1] ? ' Fail' : ' Ok'; } __END__ [17:39:11.61] c:\test>junk3 BAADDBBCADEF Ok ADAEFADBCEEA Ok CECBAABFCEFE Ok CDDCFDAABEBF Ok ADFBFCEADFFF Fail DAABAAFDCCCE Fail ABAADCBDFBBB Fail FDDDBEEADBCC Fail CEDCDDFCEEFA Ok FCBECCCEAFEB Fail [download]	[reply] [d/l]
Re^3: Random data generation. by Limbic~Region (Chancellor) on Jun 27, 2010 at 17:02 UTC
BrowserUk, Yes, there were at least 2 bugs. Try this instead. `#!perl -slw use strict; use List::Util qw/ shuffle /; sub x { my( $N, @M ) = @_; my ($minus_1, $minus_2) = ($M[rand @M], $M[rand @M]); my $str = $minus_1 . $minus_2; for (1 .. $N - 2) { my $idx = rand @M; $idx = ($idx + (int(rand $#M) + 1)) % @M if $minus_1 eq $minus +_2 && $M[$idx] eq $minus_2; $str .= $M[$idx]; ($minus_1, $minus_2) = ($minus_2, $M[$idx]); } return $str; } for( 1 .. 100 ) { my $x = x( 12, qw[ A B C D E F ] ); print $x, $x =~ m[(.)\1\1] ? ' Fail' : ' Ok'; }` [download] Cheers - L~R	[reply] [d/l]
Re^4: Random data generation. by BrowserUk (Patriarch) on Jun 27, 2010 at 17:13 UTC
Re: Random data generation. by Khen1950fx (Canon) on Jun 27, 2010 at 12:03 UTC
I tried a variation on a card shuffle. `#!/usr/bin/perl use strict; use warnings; my @random = (qw/a b c d e f/) x 2; foreach my $i (reverse 0..$#random) { my $r = int rand ($i+1); @random[$i, $r] = @random[$r, $i] unless ($i == $r); } print @random, "\n";` [download]	[reply] [d/l]
Re^2: Random data generation. by BrowserUk (Patriarch) on Jun 27, 2010 at 15:11 UTC
This suffers the same limitation as jwkrahn (that initially had me fooled), in that: it can only deal with strings up to twice the length of the input set; Whilst I gave 2 & 12 as an examples of M & N, I thought it was clear that other values were possible. it will only generate a small fraction of the possible legal values; Even at length 12, it won't generate `abbabbabbabb`. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l]
Re: Random data generation. by Sewi (Friar) on Jun 28, 2010 at 08:10 UTC
What's about `my @chars = (A..F); my @strings; while ($#strings < 5) { my $random = join('',map { $chars[int(rand(scalar(@chars)))]; } (1..1 +2)); $random =~ /(.)\1{2}/ and next; push @strings, $random; }` [download] Try Padre - the free Perl IDE	[reply] [d/l]
Re^2: Random data generation. by almut (Canon) on Jun 28, 2010 at 09:44 UTC
Generating the entire string before you check becomes highly inefficient for long strings, because the likelihood to have a violation approaches 1.0 as strings get longer, so you'll throw away almost every sample you generated...	[reply]
Re: Random data generation. by davies (Prior) on Jun 29, 2010 at 12:18 UTC
Another go, seeing that my previous attempt didn't do what you wanted. I don't see anyone else trying this algorithm, but it may have been done by someone who can write better Perl than I can. use strict; use warnings; my $nRepeats = 2; my @sSet = qw (a b c d e f); my $nLength = 12; my %sSets; for my $nKey (0..((@sSet) $nRepeats - 1)) { my $sKey = ""; for (1..$nRepeats) { $sKey .= $sSet[($nKey / (($#sSet + 1) ($_ - 1))) % ($#sSet ++ 1)]; } $sSets{$sKey} = [@sSet]; } for (0..$#sSet) { splice(@{$sSets{$sSet[$_] x $nRepeats}}, $_, 1); } my $sString; my @sChr; for (1..$nRepeats) { my $sRnd = $sSet[int(rand(@sSet))]; $sString .= $sRnd; push(@sChr, $sRnd); } for ($nRepeats + 1 .. $nLength) { my $sKey; for (@sChr) { $sKey .= $_; } my $sRnd = @{$sSets{$sKey}}[int(rand(@{$sSets{$sKey}}))]; $sString .= $sRnd; push(@sChr, $sRnd); shift(@sChr); } print "$sString \n"; [download] The algorithm was developed on one of the hot nights when you couldn't sleep, either. The Perl has taken me 2 days to write with all the other distractions life affords. I learned some useful stuff writing it (thanks, Tye, for helping me in CB), so I'm putting it up even though you have your solution! Regards, John Davies	[reply] [d/l]


Pathologically Eclectic Rubbish Lister
	PerlMonks