comment on

I'd seen Abigail do nasty things with regexes before, but never really understood them at all. However, in the recent N-Queens solver, it was a pure regex (backrefs only), there were no (?{}) constructs, and I was finally able to understand how Abigail does it..

Then, on a regex kick, I discovered Abigail's pure regex 3-SAT reduction. If you haven't already seen it, it's the coolest couple of lines on the planet. With all this regex excitement, I decided to try my hand at solving some NP-complete problem(s) with pure regexes.

I tried a handful of different NP-complete problems, but after a few emails with MJD, I understood why these attempts went wrong. I kept getting hung up on the string and regex size being exponential on the size of input (not meaning my solution was incorrect, but invalidating its use as an NP-completeness reduction proof). Finally, I think I may have gotten it right with Hamiltonian Circuits. Abigail has done this before using extended regexes, but claimed to be stuck on a pure regex solution that wasn't exponential.

So without further ado, here's my solution. Given a (simple, undirected) graph with E edges and V vertices, it finds a Hamiltonian Circuit (a cycle that visits every vertex exactly once, starting and ending in the same place). The size of the string it creates is bounded by O(V^4), and the size of the regex it creates is bounded by O(V^2).

my @E = ([1,3],[1,5],[2,3],[2,4],[2,5],[4,5]);
my $V = 5;
my $verbose = 1;

my @all_edges = map { my $x = $_; map { [$x, $_] } $x+1 .. $V } 1 .. $
+V-1;

my $string = (join(' ', 1 .. $V) . "\n") x $V
           . "\n"
           . (join(' ', map { join "-", @$_ } @all_edges ) . "\n")
               x @all_edges
           . "\n"
           . (join(' ', map { join "-", @$_ } @E ) . "\n") x $V;

my $regex = "^ "
          . ".* \\b (\\d+) \\b .* \\n\n" x $V
          . "\\n\n"
          . join("", map { my ($x, $y) = @$_;
                           ".* \\b (?: \\$x-\\$y | \\$y-\\$x ) \\b .* 
+\\n\n"
                         } @all_edges)
          . "\\n\n"
          . join("", map { my ($x, $y) = ($_, $_+1);
                           ".* \\b (?: \\$x-\\$y | \\$y-\\$x ) \\b .* 
+\\n\n"
                         } 1 .. ($V-1))
          . ".* \\b (?: \\$V-\\1 | \\1-\\$V ) \\b .* \\n \$\n";

print "'$string' =~ /\n$regex\n/x\n" if $verbose;

if (my @c = $string =~ /$regex/x) {
  local $" = " -> ";
  print "Hamiltonian circuit: [ @c -> $1 ]\n";
} else {
  print "No Hamiltonian circuit\n";
}
[download]

Now the dirty details.. Here's the value of $string and $regex for the example graph included:

$string = q[
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5

1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5

1-3 1-5 2-3 2-4 2-5 4-5
1-3 1-5 2-3 2-4 2-5 4-5
1-3 1-5 2-3 2-4 2-5 4-5
1-3 1-5 2-3 2-4 2-5 4-5
1-3 1-5 2-3 2-4 2-5 4-5
];

$regex = q[^
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
.* \b (\d+) \b .* \n
\n
.* \b (?: \1-\2 | \2-\1 ) \b .* \n
.* \b (?: \1-\3 | \3-\1 ) \b .* \n
.* \b (?: \1-\4 | \4-\1 ) \b .* \n
.* \b (?: \1-\5 | \5-\1 ) \b .* \n
.* \b (?: \2-\3 | \3-\2 ) \b .* \n
.* \b (?: \2-\4 | \4-\2 ) \b .* \n
.* \b (?: \2-\5 | \5-\2 ) \b .* \n
.* \b (?: \3-\4 | \4-\3 ) \b .* \n
.* \b (?: \3-\5 | \5-\3 ) \b .* \n
.* \b (?: \4-\5 | \5-\4 ) \b .* \n
\n
.* \b (?: \1-\2 | \2-\1 ) \b .* \n
.* \b (?: \2-\3 | \3-\2 ) \b .* \n
.* \b (?: \3-\4 | \4-\3 ) \b .* \n
.* \b (?: \4-\5 | \5-\4 ) \b .* \n
.* \b (?: \5-\1 | \1-\5 ) \b .* \n $
];

__OUTPUT__
Hamiltonian circuit: [ 5 -> 4 -> 2 -> 3 -> 1 -> 5 ]
[download]

Here's how it works:

In the first "paragraph", $1 through $5 each get a vertex assigned to them. This paragraph of $string is bounded by O(V^2), and $regex by O(V).
The second "paragraph" is the ugly bit. We have to make sure all the vertices chosen are different. This could have been done by originally choosing $1 through $5 to be any permutation from an exhaustive list of permutations, but such a list would have been exponential in size.

The way I checked is by picking any 5 vertices, then ensuring that they are all pairwise distinct. In $string, we have a repeated list of all "valid" (distinct) vertex pairs. In $regex, we make sure that every pair of two vertices in {$1,..,$5} shows up in that list.
This isn't exponential, because there are V(V-1)/2 "valid" (distinct) pairs, and V(V-1)/2 pairs to verify. So this paragraph of $string is bounded by O(V^4), and $regex by O(V^2).
The third "paragraph" is where we verify that our distinct vertex set has edges from $1 to $2 ... to $5 and then back to $1. This paragraph of $string is bounded by O(E*V) = O(V^3), and $regex by O(V).

If all these conditions matched, then the regex matches, and $1 through $5 must be the vertices of a Hamiltonian Circuit.

There you have it. Your feedback is welcome. I really hope I haven't made any mistakes in my analysis. This code isn't one-tenth as beautiful and elegant as Abigail's 3-SAT reduction, and perhaps it could be improved upon. But hey, we all have to start somewhere ;)

Update: modified code so that $string is a little clearer to read. Instead of comma-separating the edge and vertex lists, they are separated by spaces. $regex is modified accordingly, matching on \b where appropriate, instead of a comma.

blokhead

In reply to Pure regex Hamiltonian Circuit solution by blokhead

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Pathologically Eclectic Rubbish Lister
	PerlMonks