P is for Practical PerlMonks

### Generic De Bruijn Sequence

by QM (Parson)
 on Apr 19, 2017 at 16:38 UTC ( #1188292=CUFP: print w/replies, xml ) Need Help??

I needed to write a sequence for a test, where there are N elements, and all T-tuples of elements are used, in the shortest sequence. A search turned up De Bruijn sequences.

The easy algorithm takes N elements and produces every N-tuple permutation. With a little tinkering, I have the B(N,T) version. Not fast, can't handle large sequences, and runs out of space quickly. But for small T, does what I need.

It quickly became apparent that large T values would not be useful in my test, though N could be about 50.

Here are some results:

```> debruijn.pl 3 3
AAACCCBCCACBBCBACABCAABBBABAA

> debruijn.pl 4 4
+CACBBBCBBACBABCBAACACABBCABACAABCAAABBBBABBAABABAAA

> debruijn.pl 4 2

> debruijn.pl 5 2

Here's the code.

```#!/usr/bin/env perl
#
# Generate a string with the longest non-repeating subsequences possib
+le.
# Include overlaps.
#
# Input N, size of the alphabet.
# Input T, tuples (pairs, triples, quadruples, etc.)
#
# Create a graph of all pairs A(B...Y)Z,
#   such that every pair whose left member ends with B...Y,
#   the right member starts with B...Y (for some length n-1)
# Find an Eulerian path through the permutations (visit every node onl
+y once)
# The sequence of starting node, plus each additional ending letter,
#   is the De Bruijn sequence for this alphabet.

use strict;
use warnings;

my \$n = (shift or 4);  # N for alphabet size
my \$t = (shift or \$n); # T for Tuples (pairs, triples, quadruples)
my \$n1 = \$n - 1;
my \$t1 = \$t - 1;

my @alphabet = ('A'..'Z','0'..'9','a'..'z');

if (@alphabet < \$n) {
die "Alphabet is smaller than \$n\n";
}

# glob character of length 1
my \$alphabet = '{' . join(',', @alphabet[0..\$n1]) . '}';

# Generate all strings of length \$t in the given alphabet
my \$glob_string = \$alphabet x \$t;
my @nodes = glob("\$glob_string");

# Generate the graph of all strings that overlap in t-1 characters.
my %graph;
for my \$node1 (@nodes) {
for my \$node2 (@nodes) {
next if \$node1 eq \$node2;
# If they overlap, add node2 to the array for node1
if (substr(\$node1,1,\$t1) eq substr(\$node2,0,\$t1)) {
push @{\$graph{\$node1}}, \$node2;
}
}
}

# String starts with first node's full string.
# Walk through the graph:
#   Delete the node behind
#   Add the last char of next node to string
# Print result

my \$node1 = \$nodes[0];
my \$q = 0;
# print "\$q : \$node1\n"; # debug
my \$string = \$node1;
while (scalar keys %graph > 1) {
my \$moved = 0;
# "reverse" here somehow "does the right thing", and enables an
# Eulerian circuit with no added logic.
for my \$node2 (reverse @{\$graph{\$node1}}) {
if (exists(\$graph{\$node2})) {
\$string .= substr(\$node2,\$t1,1); # Add last char to string
delete(\$graph{\$node1});
\$node1 = \$node2;
\$moved = 1;
#           print ++\$q, " : \$node2\n"; # debug
last;
}
}
# Avoid endless loops on pathological cases
unless (\$moved) {
warn "Didn't find next node (\$node1)\n";
last;
}
}

print "\$string\n";

exit;

I'm interested in tweaks to make it faster, smaller, better, etc. Or pointers to other solutions.

-QM
--
Quantum Mechanics: The dreams stuff is made of

Replies are listed 'Best First'.
Re: Generic De Bruijn Sequence
by BrowserUk (Pope) on Apr 19, 2017 at 17:59 UTC

Here's a version I wrote for (OT) A different kind of 'combinatorics'. It'll theoretically handle up to a 63 bit DeBruijn sequence, but that would require 8 Etabytes of ram, so is untested :) It has been tested on 31-bit sequence, which require 2GB of ram and completes in under 30 minutes:

```#! perl -slw
use strict;
use bytes;
use Data::Dump qw[ pp ]; \$Data::Dump::WIDTH = 1000;

### Prefer Ones:
### Write n zeros.
### Then, always write a one unless it would cause the repetition of a
+n n-length string;
### In which case, write a zero.

our \$N //= 4;

my \$t1 = "b\${ \(2**\$N+\$N-1) }";
my \$seen = '';
my \$mask1 = ( 1<<\$N )-1;
my \$mask2 = ( 1<<( 2**\$N+\$N-1 ) )-1;
my \$seq = pack 'Q*', (0) x 100;
my \$val = 0;

for( \$N .. 2**\$N+\$N-1 ) {    ## ## if N=5; 5 .. 36; if N=6, 6 .. 64+6-
+1 = 69;
\$val = ( \$val << 1 ) & \$mask1;
vec( \$seen, \$val | 1, 1 ) or do{ \$val |= 1; vec( \$seq, \$_, 1 ) = 1
+; };
vec( \$seen, \$val    , 1 ) = 1;
}
print unpack \$t1, \$seq;

__END__

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
I found a similar post in my search here earlier, but not that one. Probably some issue with whitespace and searching.

-QM
--
Quantum Mechanics: The dreams stuff is made of

I'd forgotten I'd posted that CufP. (Are you interested in the C version if I can find it?)

Re: Generic De Bruijn Sequence
by tybalt89 (Priest) on Apr 19, 2017 at 21:43 UTC

Here's a solution.

I'll leave it to you to determine if it's faster, smaller, better, etc.

```#!/usr/bin/perl -l

# http://perlmonks.org/?node_id=1188292

use strict;
use warnings;

my \$n = shift // 3;  # alphabet size
my \$t = shift // \$n; # size of tuples

my @alphabet = ('A'..'Z', '0'..'9', 'a'..'z');
my %next;
@next{'', @alphabet} = @alphabet;

\$_ = \$alphabet[0] x \$t; # start of string
my \$over = \$alphabet[\$n]; # outside of alphabet
my \$wantedlength = \$n ** \$t + \$t - 1;

while( length \$_ >= \$t )
{
if( /^(?=(.{\$t})).+\1/ ) { s/^./\$next{\$&}/; }
elsif( s/\$over(.)/\$next{\$1}/ ) { }
elsif( \$wantedlength == length \$_ )
{
print;
# prove it is correct
my %all;
\$all{\$1}++ while /(?=(.{\$t}))/g;
my \$count = keys %all;
print "want @{[ \$n ** \$t ]} unique strings, have \$count";
die "duplicate string" if grep \$_ > 1, values %all;
exit;
}
else { \$_ = \$alphabet[0] . \$_; }
}

The loop seems awkward, there ought to a cleaner way...

Re: Generic De Bruijn Sequence
by LanX (Bishop) on Apr 19, 2017 at 22:49 UTC
Did you consult wikipedia or search for other algorithms?

Many sources recommend an algorithm from Frank Ruskey , easily found implemented in various languages, here one in Python De_Bruijn_sequence#Algorithm which is not too hard to be ported.

Besides: The minimal length k^n is proven to be always achievable.

Which consequently means at least your first result and hence your algorithm is wrong.

##### update
De Bruijn is supposed to be cyclic!

AAACCCBCCACBBCBACABCAABBBABAA

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

De Bruijn is supposed to be cyclic AAACCCBCCACBBCBACABCAABBBABAA

Yes. The last \$n-1 chars overlap with the beginning. The debug prints show each permutation lined up with the output string, so you can see how the graph is traversed and the output constructed. I could have just used the last char of each permutation, and had a true De Bruijn output, but that didn't suit my needs.

I found the Python code, but didn't like it because of the lack of explanation. To understand it, I have to trace code that does a lot of index math, instead of reading comments hinting at an algorithm that makes sense on a higher level. It's not far removed from pointer math and assembler code, so it doesn't seem very Pythonic, as they like to say.

If nothing else, my code at least tries to say what it's doing. So the next guy can try to understand it, and not blindly accept it, nor need to walk through on paper to believe that it's correct.

-QM
--
Quantum Mechanics: The dreams stuff is made of

> . The last \$n-1 chars overlap with the beginning
```> debruijn.pl 5 2
123456789012345678901234567890

I don't understand how \$n=5 (Alphabet size according to your code) results in length 26 instead of 25.

##### edit

>  walk through on paper to believe that it's correct.

Your user image shows Richard P. Feynman and you are questioning scientific reviews in mathematics? Ironic :)

##### UPDATE

") did you mean \$t-1 ?

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

> so it doesn't seem very Pythonic, as they like to say.

seems to be a port of the ruby version which is pretty straight forward ported for a Perl hacker

##### update

Basically it's a inductive solution building complex sequences from more primitive ones starting with (1,1).

I'm sure porting to Perl, using longer var names and dumping part solutions would explain the idea.

##### update

Though finding the original article would be better.

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

De Bruijn is supposed to be cyclic! AAACCCBCCACBBCBACABCAABBBABAA

Maybe, but practicality means it is far simpler and more efficient to repeat the end sequence than to wrap around.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
Maybe in this case, but check the length of his results, either way they are inconsistent and hence wrong.

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

Re: Generic De Bruijn Sequence
by tybalt89 (Priest) on Apr 19, 2017 at 23:37 UTC

You wanted tweaks :) This is at least smaller.

```#!/usr/bin/perl -l

# http://perlmonks.org/?node_id=1188292

use strict;
use warnings;

my \$n = shift // 3;  # alphabet size
my \$t = shift // \$n; # size of tuples

my @alphabet = ('A'..'Z', '0'..'9', 'a'..'z');
my (%next, %all);
@next{'', @alphabet} = @alphabet;

\$_ = \$alphabet[0] x \$t; # start of string
my \$over = \$alphabet[\$n]; # outside of alphabet
my \$need = \$n ** \$t + \$t - 1;

1 while s/^(?=(.{\$t})) (?=.+\1) (.) | \$over(.)/\$next{\$+}/x # advance c
+har
or s/^ (?=.{\$t}) (?!.{\$need}) /\$alphabet[0]/x; # add new char

print;         # then prove it is correct
\$all{\$1}++ while /(?=(.{\$t}))/g;
print "want @{[ \$n ** \$t ]} unique tuples, have @{[ scalar keys %all ]
+}";
Re: Generic De Bruijn Sequence
by tybalt89 (Priest) on Apr 20, 2017 at 13:43 UTC

Other replies mentioned the "prefer ones" algorithm. If you think, instead, of it as a "prefer last" algorithm, it appears to work for your stated problem. Start with T of the first character. Then try to add the last character to the answer, moving from last character to second last character to third last character, etc., as long as a repetition exists.

```#!/usr/bin/perl -l

# http://perlmonks.org/?node_id=1188292

use strict;
use warnings;

my \$n = shift // 3;  # alphabet size
my \$t = shift // \$n; # size of tuples

my @alphabet = ('A'..'Z', '0'..'9', 'a'..'z');
my %previous;
@previous{@alphabet[1..\$#alphabet]} = @alphabet; # last to 2nd last, e
+tc.
my \$need = \$n ** \$t + \$t - 1; # length of the answer

\$_ = \$alphabet[0] x \$t; # start string with first chars

printf "%77s\n", \$_ while
s/^ (?=(.{\$t})) (?=.+\1) . /\$previous{\$&}/x   # prev char if repeat
or \$need > length && s/^/\$alphabet[\$n - 1]/;  # or add last char

my \$chars = join '', @alphabet[0..\$n-1];      # test if valid
/^[\$chars]+\$/ or die "invalid character";
my %all;
\$all{\$1}++ while /(?=(.{\$t}))/g;
print "\nwant @{[ \$n ** \$t ]} unique tuples, have @{[ scalar keys %all
+ ]}";
print "solution passes tests";

I haven't found a case yet where backtracking is required.

Is this a guess or do you have a formal proof?

Took me a while to understand why prefer "ones succeeds", and AFAIK only if the alphabet is a power of two.

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

It's a guess.

Does "prefer ones" succeed regardless of the alphabet size?

My solution is, I think, a "prefer last" solution. Testing a few basic, low valued, parameters appears to give the correct answer.

Perhaps if someone has a good head for this, they could suggest some parameter values that break under "prefer last" (or indeed, any other method). I don't have the time to tinker, and besides, I'm not that good at poking around the edges.

-QM
--
Quantum Mechanics: The dreams stuff is made of

Create A New User
Node Status?
node history
Node Type: CUFP [id://1188292]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2018-08-21 12:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Asked to put a square peg in a round hole, I would:

Results (198 votes). Check out past polls.

Notices?