Hello.
A week ago I met an interesting problem about
regular expressions. It was from the
OpenCup contest held in 2017 november 19. Here I will post a statement, and I ask to discuss how to solve this problem, what strategies and tools to use. I recommend to try to solve this problem by yourselves, and if you would like, you can copy pdf statements with this
link, problem number
7 (You can download statement, while the link works, but you can't access and upload your solution for a testing system if you are not participating with your team from your high school). I used perlbrew switch v5.14.4 to match a version of testing system, and before I was failing with v5.18 because of some newer features which I used. At the bottom I will write my approaches and code.
Statement:
Regular Expressions
Time limit 10 seconds
Memory limit 256Mb
Input standard input or input.txt
Output standard output or output.txt
Pavel doesn’t have a college education. He’s a natural programmer. He
+loves brief, laconic bits of code more than anything else in the worl
+d. One-liners are his favorite! Striving for shorter code he often ma
+kes use of regular expressions. There is nothing more creative and en
+joyable than working on yet another regular expression in the middle
+of the night.
But bummer. Pavel’s colleague, Simon, is often unhappy with Pavel’s co
+de. Simon never uses regular expressions in his code. He’s just jealo
+us! If he knew a thing about regular expressions, he wouldn’t complai
+n.
Once Pavel was careless to remark that Simon is not creative enough to
+ build complex regular expressions. Simon replied, quite sacrilegious
+ly, that there was nothing creative about regular expressions. Their
+argument evolved into a competition: they decided to find out who wou
+ld be the fastest to write a regular expression matching a defined se
+t of strings. Simon wants to win by creating a program which would so
+lve this problem automatically, thus proving the lack of creative com
+ponent in the process. Their friend Peter, a bioinformatician, will b
+e the judge.
Simon glanced at the syntaxis of Perl regular expressions and was horr
+ified: is this what programmers have done to a simple and elegant con
+cept? Simon graduated from the mathematics department of the universi
+ty and has always thought that a regular expression is the following:
0 (zero character) — a regular expression that matches no string.
a, g, t, c (letter) — a regular expression matching precisely one
+string: a single-letter string with the same letter as that in the ex
+pression.
If R and P are regular expressions, then (R|P) is also a regular e
+xpression. It matches all strings which match at least one of the exp
+ressions R and P.
If R and P are regular expressions, then (RP) is also a regular ex
+pression. It matches all strings which can be split in two in such a
+way that the first part matches the expression R, and the second part
+ matches the expression P.
If R is a regular expression, then (R*) is also a regular expressi
+on. It matches all strings which can be split into k >= 0 parts so th
+at each part matches the expression R.
For instance, the regular expression (a*) matches any string containin
+g only letters a, including an empty string. The regular expression (
+0*) only matches an empty string (which can be split into zero parts
+matching the expression 0). The regular expression (a|(g(tc))) matche
+s two strings: a and gtc. Note that it is forbidden to omit or add ex
+tra brackets to regular expressions: it must contain strictly those p
+airs of brackets which appear during its construction according to th
+e rules described above.
Simon wants a flawless victory by building the shortest matching regul
+ar expression. Help Simon. Write a program which finds the shortest r
+egular expression for a set of strings, such that all these strings m
+atch the expression.
Input format
The first line contains an integer T — the number of tests. It is foll
+owed by test descriptions.
The first line of a test description contains a single positive intege
+r N — the number of strings in the set. Each of the following N lines
+ begins with a nonnegative integer L — the length of the string from
+the set, followed by the string itself, separated by a space. The str
+ing only contains lowercase latin letters a, g, t, c. Note that a str
+ing in the set can be empty. In this case the line in the file will o
+nly contain the digit 0 and a space symbol.
The total number of strings in all sets is not greater than 2000. The
+sum of lengths of all strings in all sets is not greater than 2000.
Output format
Print precisely T regular expressions, one per line. Each regular expr
+ession must be an answer to a corresponding test. If for any test the
+re is no regular expression matched by all strings from the set, prin
+t the word Impossible instead of the answer.
Sample
Input
3
2
1 a
3 gtc
1
0
3
1 g
2 gg
3 ggg
Output
(a|(g(tc)))
(0*)
(g*)
Notes
The sixth line of the input data in the example zero is followed by a
+space symbol.
Here I will write ideas about solving.
Firstly, I was thinking about two approaches: 1) to generate regex from an input data, e.g. to make a regex from the first input line, and later modify (expand) it when analysing next lines; 2) to generate all possible regexes, then sort them by length, then try to match all input lines against shortest regex, if fail then match against longer regexes.
The first approach seems more sophisticated and I haven't found any ideas how to solve by that way. Can you suggest something? Second approach seems easier, and I successfully used it.
Secondly, I realized that there are a regular expression which can match any possible line composed by only 4 distinct letters. So, I need to generate all regular expressions, which are shorter than all-matching-regex.
Thirdly, I was thinking how to generate possible regexes. There were one idea at first. Later I got another. First idea was to generate all possible permutations of letters, and later add all possible permutations of other symbols into permutations of letters. This approach take much time for me, and it wasn't successful. Of course I tried to find some logics and avoid generating regular expressions which are longer than their shorter equivalents. Second idea was to generate regular expressions by joining smallest ones - only letters (say "atoms") - to the bigger and bigger ones. That expanding was performed by binary joining or by enveloping regex with Kleene star.
Further are more ideas about solution, and the code.
It become obvious that regex [acgt]* is the shortest one, which matches any line, including empty. Later, one can find that number of parentheses of Simon's regex depends on the number of letters + stars. So shortest all-matching regex, e.g. ((((a|c)|g)|t)*), has length of 16, or ((letters+stars)-1)*3 +1 +sticks.
Later I found out that:
* any consecutive characters can be written with star reaching minimal length of 4: (aa) [4], ((aa)a) [7], ... -> (a*) [4].
* any star inside an OR construction, can be brought outside an OR: ((a*)|c) [8] -> ((a|c)*) [8], (g|((ac)*)) [11] -> ((g|(ac))*) [11].
* one can never use 0 symbol in regexes, because any other symbol (a letter) with star matches an empty line. And if there are an empty line in the input, then correct regex must contain a star.
* regex can have upto 5 characters from the set [*acgt] at most, because 6 characters make regex longer than the all-matching regex. That means also, that if an input line contains more than 5 letters, the star must be included into regex.
* the set of letters of a regex can not differ by more than 1 from the number of letters+stars. It is because one can generate a shorter all-set matching regex [<set>]*, e.g. (((ac)(ga))c) [13] is longer than simply (((a|c)|g)*) [12].
These and other ideas helped me to generate less regexes, which should cover all possible inputs of lines. I used to generate all possible regexes by length less than 16, excluding regexes with more than 1 star, and excluding regexes with stars which are inside an OR alternatives.
I used two similar approaches. To use generated regexes against an input data. First is to grep all matching regexes while examining first input line. So at the second input line it will be less regexes to try to match. Second approach was to take a shortest regex and try to match all input lines, and take next regex if process fails, and repeat.
I got Time-limit. It seams that my approach is not efficient.
Later I remembered that there are an operator 'qr', which compiles regex once, so it saved program running time much. But still Time-Limit.
Next I tried to minimize a set of regexes which I've generated. For example, regexes ((ac)g) and (a(cg)) have a same meaning, so I used to delete such "duplicates". I tried to find more regexes which are "duplicates" and then deleted them. It saved a bit of program running time.
The next idea was to use "abstract" regexes. I mean regexes which contains a "regex dot" (any character) instead of actual letters. So I used to generate all possible abstract regexes, and got only some hundreds of them. Then I used these regexes against input data, and many of regexes matched. But the number decreased by about 3-10 times. So next I matched compiled (qr) non-abstract regexes against an input data, and a set of non-abstract regexes was decreased by about 3-10 times, from few dozen thousand to some thousand. In my code I used a data structure of Hash_of_Arrays. Keys of hash are abstract regexes. Arrays contain compiled regexes corresponding to abstract regex. (Earlier I tried to use Hash_of_Hashes because I used non-abstract regexes as strings in keys of deeper hashes). My code got Accepted with 2.5 sec time, but test cases are not available, and not plenty of them (only 20), so I am not sure if my solution is correct for all possible data.
... I was thinking that the type of "abstract" regexes which have backreferences (<named> for a comfort) can be useful. But I am not sure. I tried unsuccessfully.
Maybe this problem can be solved with more advanced techniques: recursive regexes, eval-groups, and other?
Related topic:
Perl in programming contests and problem solving
#regexes #qr #golf #problem #efficiency
CODE:
#!/usr/bin/perl
use warnings;
use strict;
use re 'eval';
use Time::HiRes qw( gettimeofday tv_interval );
my $t0 = [ gettimeofday ];
$\ = $/;
my $debug = 0;
my $debug2 = 0;
my $time = 0;
my @acgt = qw( a c g t );
my @combinations = map { [ combinations( $_, $_ - 1, @acgt ) ] } 1
+ .. 5;
$debug and print ~~ @{ $_ } for @combinations;
my %regexes = ( '_' => 1 );
while( 1 ){
my @old_regexes = sort { length $a <=> length $b || $a cmp $b
+} keys %regexes;
$debug and print " " . @old_regexes;
for my $i ( 0 .. @old_regexes - 1 ){
my $copy_of_index_i = $i;
$i = $old_regexes[ $i ];
if( $i !~ /\*/ ){
$regexes{ "($i*)" } ++ if ( length $i ) + 3 < 16;
for my $j ( $copy_of_index_i + 0 .. @old_regexes - 1 )
+{
$j = $old_regexes[ $j ];
last if ( length $i ) + ( length $j ) + 3 > 15;
next if $j =~ /\*/;
$regexes{ "($i|$j)" } ++;
# $regexes{ "($j|$i)" } ++;
}
}
for my $j ( $copy_of_index_i + 0 .. @old_regexes - 1 ){
$j = $old_regexes[ $j ];
last if ( length $i ) + ( length $j ) + 2 > 15;
next if $i =~ /\*/ && $j =~ /\*/;
$regexes{ "($i$j)" } ++;
$regexes{ "($j$i)" } ++;
}
}
last if @old_regexes == keys %regexes;
}
$debug2 and print "regexes(30): ", join ' ', ( sort { length $a <=
+> length $b } keys %regexes )[ 0 .. 30 ];
$debug and print "number of abstract regexes: " . keys %regexes;
my %uniq;
map $uniq{ $_ } ++, keys %regexes;
%regexes = map { $_ => 1 } keys %uniq;
$debug and print ~~ keys %regexes;
for my $re ( sort keys %regexes ){
$re =~ / (\w) [(] (\w)(\w) [)] /x or next;
delete $regexes{ $re };
$re =~ s/ (\w) [(] (\w)(\w) [)] /($1$2)$3/x;
$regexes{ $re } ++;
}
$debug and print ~~ keys %regexes;
for my $re ( sort keys %regexes ){
$re =~ / (\w) [(] [(] (\w)(\w) [)] (\w) [)] /x or next;
delete $regexes{ $re };
$re =~ s/ (\w) [(] [(] (\w)(\w) [)] (\w) [)] /(($1$2)$3)$4/x;
$regexes{ $re } ++;
}
$debug and print ~~ keys %regexes;
for my $re ( sort keys %regexes ){
$re =~ / [(] (\w) (\w) [)] [(] (\w) (\w) [)] /x or next;
delete $regexes{ $re };
$re =~ s/ [(] (\w) (\w) [)] [(] (\w) (\w) [)] /(($1$2)$3)$4/x;
$regexes{ $re } ++;
}
$debug and print ~~ keys %regexes;
$regexes{ "((((a|c)|g)|t)*)" } ++;
$debug and print ~~ keys %regexes;
my @regexes = sort { length $a <=> length $b } keys %regexes;
y/_/./ for @regexes;
$debug2 and print "regexes(30): @regexes[ 0 .. 30 ]";
%regexes = map { $_ => 1 } @regexes;
my %HoR;
for my $re ( keys %regexes ){
my $cnt = () = $re =~ /\./g;
for my $comb ( @{ $combinations[ $cnt - 1 ] } ){
$_ = $re;
for my $letter ( split //, $comb ){
s/\./$letter/;
}
next if /\*/ and 4 == $cnt and /(\w).*\1/;
next if / [(] (\w) \| (\w) [)] /x and $1 ge $2;
next if / [(] [(] (\w) \| (\w) [)] \| (\w) [)] /x and $1 g
+e $2 || $1 ge $3 || $2 ge $3;
push @{ $HoR{ $re } }, qr/^$_$/;
}
}
if( $debug ){
my @c = ( 0 ) x 5;
my $c = 0;
for my $re ( keys %HoR ){
my $cnt = () = $re =~ /\./g;
$c[ $cnt - 1 ] += @{ $HoR{ $re } };
$c += @{ $HoR{ $re } };
}
print "all regexes: " . $c;
print " $_" for @c;
}
$time and print ' ', tv_interval( $t0 );
################ MAIN:
<>;
while( <> ){
@_ = map ~~<>, 1 .. $_;
chomp @_;
s/\S+ // for @_;
$debug2 and print "\@_:@_";
s/(.)\1{2,}/ $1 x 3 /ge for @_;
s/(.{2,4})\1{2,}/ $1 x 2 /ge for @_;
$time and print ' ', tv_interval( $t0 );
my %uniq;
map $uniq{ $_ } ++, @_;
@_ = sort keys %uniq;
my @regexes = sort { length $a <=> length $b } keys %HoR;
for my $str ( @_ ){
$debug and print " candidate abstract regexes: " . @regexes;
$debug2 and print " str:[$str]";
my @ok;
for my $re ( @regexes ){
$str =~ /^$re$/ and push @ok, $re;
}
@regexes = @ok;
}
$debug and print " abstract regexes which match: " . @regexes;
$debug2 and print " abstract regexes: @regexes";
my @real;
@regexes = map @{ $HoR{ $_ } }, @regexes;
for my $str ( @_ ){
$debug and print " candidate regexes: " . @regexes;
$debug2 and print " str:[$str]";
my @ok;
for my $re ( @regexes ){
$str =~ $re and push @ok, $re; # HOT SPOT
}
@regexes = @ok;
}
$debug and print " regexes which match: " . @regexes;
my $shortest = shift @regexes;
my $strip = join "(.*)", map quotemeta, qw[ (?^:^ $) ];
map { s/^ $strip $/$1/x } $shortest;
#^ map { s/^ \Q(?^:^\E //x, s/ \Q$)\E $//x } $shortest; # alternat
+ive: OK
print $shortest ? $shortest : "Impossible";
$time and print tv_interval( $t0 );
$debug and print '-' x 20;
}
########## END MAIN
sub combinations {
my( $length, $min_set, @letters ) = @_;
my $str = join '-', ( join '', @letters ) x $length;
$debug2 and print $str;
my $re = join join( '-', ( '.*?' ) x 2 ), ('(.)') x $length;
$debug2 and print $re;
my %combs;
$str =~ /$re (?{ $combs{ join '', grep defined, $1, $2, $3, $4, $5
+ } ++ }) (*FAIL)/x;
my @combs = keys %combs;
$debug2 and print "all combs: " . @combs;
# @combs = grep { !/(.)\1/ } @combs;
# $debug2 and print "all combs: " . @combs;
@combs = grep {
my %uniq;
map $uniq{ $_ } ++, split //;
$min_set <= keys %uniq;
} @combs;
$debug2 and print "unique combs: " . @combs;
$debug2 and print "@combs";
return @combs;
}
INPUT
*
2
1 a
3 gtc
1
0
3
1 g
2 gg
3 ggg
2
- tcac
- gcac
2
- tac
- gtc
2
- gag
- tcag
2
- ga
- ctca
2
- a
- acaca
2
- aaa
- cccc
3
- aaa
- ccc
- ttt
4
- a
- c
- g
- t
4
- aaaaa
- c
- g
- t
3
- acac
- caca
- g
1
- g
1
- gcgc
1
- gcgcg
1
- acacacacacacacacacac
1
- aaaaaaaaaaaaaaaaaaaaaaaaaac
1
- aaaaaaaaaacg
1
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac
1
- acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
1
- acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+g
1
- acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+acacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacaca
+cacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacacac
+gt
9
- a
- aaaaaaaaaaaaaaaaaaaaaaa
- aa
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacccc
- aac
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac
- ac
- aaaaaac
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac
32
- a
- c
- t
- g
- aa
- cc
- tt
- gg
- a
- c
- t
- g
- aa
- cc
- tt
- gg
- a
- c
- t
- g
- aa
- cc
- tt
- gg
- a
- c
- t
- g
- aa
- cc
- tt
- gg
2
- acgtgca
-
2
- acgt
-
1
- aaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaa
+acccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccc
+cctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaa
+aaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccc
+cccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaa
+aaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccc
+cccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaa
+aaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccc
+cccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaa
+aaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccc
+cccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaa
+aaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacc
+cccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccct
+aaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaa
+cccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaaccccccccccccc
+ctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaa
+aacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaaccccccccccc
+ccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaa
+aaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaaccccccccc
+ccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaa
+aaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaaccccccc
+ccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaa
+aaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaaccccc
+ccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaa
+aaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaaccc
+ccccccccccctaaaaaaaaaaaaacccccccccccccctaaaaaaaaaaaaacccccccccccccct
1
- acggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacga
+gcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggac
+ggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcga
+gcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcg
+agcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcgg
+acggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgac
+gacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacg
+agcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacg
+acgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacg
+gacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacg
+gcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcga
+cggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggac
+ggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcga
+cgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcg
+acggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacgg
+acggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacg
+agcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcg
+acgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacga
+cgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacg
+gacggacggagcggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcag
+cgagcgacgacgacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacgga
+gcgagcgacggacgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggag
+cggacggacggacgacgacgagcgacggagcgagcgacggacgagcgacgaacggcagcgagcgacgac
+gacgacgagcgacggacggacggacggagcggacggacggacgacgacgagcgacggagcgagcgacgg
+acgagcgacgaacggcagcgagcgacgacgacgacgagcgacggacggacggacggagcggacggacgg
+acgacgacgagcgacggagcgagcgacggacgagcgacgat
1
- acgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacg
1
- acgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgt
2
-
-
2
-
- c
3
-
- c
- t
3
-
- ccc
- t
2
- acgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacgt
- acgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgacgac
+gacgacgacgacga
2
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacgcgcgcgcgcgcgcgaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaa
- aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacgcgcgcgcgcgcgcgaaaaaaaaaaaa
+aaaaaaaaaaaaaaaaaat
8
- aaaacgcgaaaa
- aaaacgcgaaaat
- aaaacgcgaaaatttttt
- aaaacgcgaaaatttt
- aaaacgcgcgaaaatttttt
- aaaaaaacgcgaaaatttttt
- cgcgaaaatttttt
- aaaacgcgaaaattttttcgcg
1
- acg
OUTPUT:
(a|((gt)c))
(c*)
(g*)
((((g|t)c)a)c)
(((gt)|(ta))c)
((g|(tc))(ag))
((g|((ct)c))a)
((a|c)*)
((a|c)*)
((t|(a|c))*)
((a|t)|(g|c))
((((a|c)|g)|t)*)
((a|(c|g))*)
g
((gc)*)
((c|g)*)
((ac)*)
((a*)c)
((a*)(cg))
((a*)c)
((ac)*)
(((ac)*)g)
((((ac)*)g)t)
((a|c)*)
((((a|c)|g)|t)*)
((((a|c)|g)|t)*)
((((ac)g)t)*)
((t|(a|c))*)
(((c|(a|g))*)t)
(((ac)g)*)
((((ac)g)*)t)
(c*)
(c*)
((c|t)*)
((c|t)*)
(((cg)|(a|t))*)
(((cg)|(a|t))*)
(((cg)|(a|t))*)
((ac)g)
real 0m0.616s
user 0m0.584s
sys 0m0.028s
OUTPUT verbose (
$debug = 1;):
4
16
60
168
240
1
4
25
144
235
number of abstract regexes: 235
235
208
203
198
199
all regexes: 15264
8
76
960
3384
10836
candidate abstract regexes: 199
candidate abstract regexes: 60
abstract regexes which match: 45
candidate regexes: 2550
candidate regexes: 954
regexes which match: 292
(a|((gt)c))
--------------------
candidate abstract regexes: 199
abstract regexes which match: 27
candidate regexes: 746
regexes which match: 746
(c*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 60
candidate abstract regexes: 35
abstract regexes which match: 28
candidate regexes: 1194
candidate regexes: 528
candidate regexes: 318
regexes which match: 318
(g*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 114
abstract regexes which match: 114
candidate regexes: 6382
candidate regexes: 284
regexes which match: 249
((((g|t)c)a)c)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 136
abstract regexes which match: 136
candidate regexes: 9366
candidate regexes: 669
regexes which match: 249
(((gt)|(ta))c)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 136
abstract regexes which match: 76
candidate regexes: 3210
candidate regexes: 277
regexes which match: 245
((g|(tc))(ag))
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 114
abstract regexes which match: 67
candidate regexes: 2086
candidate regexes: 258
regexes which match: 246
((g|((ct)c))a)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 60
abstract regexes which match: 34
candidate regexes: 906
candidate regexes: 462
regexes which match: 261
((a|c)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 136
abstract regexes which match: 136
candidate regexes: 9366
candidate regexes: 318
regexes which match: 255
((a|c)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 136
candidate abstract regexes: 136
abstract regexes which match: 136
candidate regexes: 9366
candidate regexes: 318
candidate regexes: 255
regexes which match: 243
((t|(a|c))*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 60
candidate abstract regexes: 60
candidate abstract regexes: 60
abstract regexes which match: 60
candidate regexes: 3940
candidate regexes: 1444
candidate regexes: 466
candidate regexes: 306
regexes which match: 264
((a|t)|(g|c))
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 136
candidate abstract regexes: 45
candidate abstract regexes: 45
abstract regexes which match: 45
candidate regexes: 2550
candidate regexes: 318
candidate regexes: 257
candidate regexes: 243
regexes which match: 240
((((a|c)|g)|t)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 114
candidate abstract regexes: 114
abstract regexes which match: 29
candidate regexes: 1002
candidate regexes: 268
candidate regexes: 255
regexes which match: 243
((a|(c|g))*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 60
candidate regexes: 3940
regexes which match: 1444
g
--------------------
candidate abstract regexes: 199
abstract regexes which match: 114
candidate regexes: 6382
regexes which match: 293
((gc)*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 82
candidate regexes: 2922
regexes which match: 261
((c|g)*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 84
candidate regexes: 2062
regexes which match: 293
((ac)*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 114
candidate regexes: 6382
regexes which match: 283
((a*)c)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 82
candidate regexes: 2922
regexes which match: 264
((a*)(cg))
--------------------
candidate abstract regexes: 199
abstract regexes which match: 114
candidate regexes: 6382
regexes which match: 283
((a*)c)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 85
candidate regexes: 2086
regexes which match: 293
((ac)*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 78
candidate regexes: 1962
regexes which match: 260
(((ac)*)g)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 91
candidate regexes: 2194
regexes which match: 258
((((ac)*)g)t)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 60
candidate abstract regexes: 35
candidate abstract regexes: 28
candidate abstract regexes: 26
candidate abstract regexes: 26
candidate abstract regexes: 26
abstract regexes which match: 26
candidate regexes: 714
candidate regexes: 408
candidate regexes: 318
candidate regexes: 318
candidate regexes: 258
candidate regexes: 256
candidate regexes: 256
regexes which match: 256
((a|c)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 60
candidate abstract regexes: 35
candidate abstract regexes: 35
candidate abstract regexes: 35
candidate abstract regexes: 35
candidate abstract regexes: 35
candidate abstract regexes: 35
abstract regexes which match: 35
candidate regexes: 2118
candidate regexes: 813
candidate regexes: 318
candidate regexes: 257
candidate regexes: 255
candidate regexes: 243
candidate regexes: 243
candidate regexes: 240
regexes which match: 240
((((a|c)|g)|t)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 27
abstract regexes which match: 12
candidate regexes: 466
candidate regexes: 466
regexes which match: 240
((((a|c)|g)|t)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 27
abstract regexes which match: 20
candidate regexes: 614
candidate regexes: 614
regexes which match: 256
((((ac)g)t)*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 86
candidate regexes: 2110
regexes which match: 243
((t|(a|c))*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 80
candidate regexes: 2010
regexes which match: 243
(((c|(a|g))*)t)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 90
candidate regexes: 2170
regexes which match: 265
(((ac)g)*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 80
candidate regexes: 2010
regexes which match: 253
((((ac)g)*)t)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 27
candidate regexes: 746
regexes which match: 746
(c*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 27
abstract regexes which match: 10
candidate regexes: 418
candidate regexes: 418
regexes which match: 304
(c*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 27
candidate abstract regexes: 10
abstract regexes which match: 10
candidate regexes: 418
candidate regexes: 418
candidate regexes: 304
regexes which match: 255
((c|t)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 27
candidate abstract regexes: 19
abstract regexes which match: 10
candidate regexes: 418
candidate regexes: 418
candidate regexes: 304
regexes which match: 255
((c|t)*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 80
abstract regexes which match: 80
candidate regexes: 2010
candidate regexes: 249
regexes which match: 243
(((cg)|(a|t))*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 83
abstract regexes which match: 70
candidate regexes: 1770
candidate regexes: 249
regexes which match: 243
(((cg)|(a|t))*)
--------------------
candidate abstract regexes: 199
candidate abstract regexes: 85
candidate abstract regexes: 70
candidate abstract regexes: 70
candidate abstract regexes: 70
abstract regexes which match: 70
candidate regexes: 1770
candidate regexes: 249
candidate regexes: 243
candidate regexes: 243
candidate regexes: 243
regexes which match: 243
(((cg)|(a|t))*)
--------------------
candidate abstract regexes: 199
abstract regexes which match: 136
candidate regexes: 9366
regexes which match: 669
((ac)g)
--------------------
real 0m0.678s
user 0m0.668s
sys 0m0.004s