Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Finding Hamiltonian Paths using the Regexp Engine

by Abigail-II (Bishop)
on Aug 07, 2003 at 22:33 UTC ( #282049=perlmeditation: print w/ replies, xml ) Need Help??

A few days ago BrowserUk had a question involving finding a path through a set of strings where each step was between two strings that only differed by a single transposition.

It turned out that he wanted to do most of the work with a regexp. A large subproblem is, how to find a Hamiltonian path in a graph. (A Hamiltonian path in a graph is a path that visits each node once, and only once. Finding such a path (or just answering the question whether such a path exists) is a very hard problem for arbitrary graphs, unlike the problem of finding an Euler path (a path that visits each edge of a graph, once and only once)).

The question kept nagging, and I managed to find Hamiltonian paths in a graph using the regexp machine. It uses only (?{ }), and (?(?{ })|) constructs, so it isn't a 'pure' regexp. I think there's a pure regexp solution as well, but everything I tried so far requires an exponentially sized query string.

The code below expects a description of the graph in the __DATA__ area, in a simple format. One line for each node, consisting of the node name, a colon, and a comma separated list of neighbours. The information is stored in a global hash %graph, and a global array @nodes is used to store all nodes in. The function hamiltonian returns an appropriate regex. Running the regex against the empty string returns false if there's no Hamiltonian path, and true otherwise. In the latter case, the global array @path will be set, containing the vertices on the path, in the appropriate order.

#!/usr/bin/perl use strict; use warnings; use re 'eval'; my %graph; my @path; while (<DATA>) { chomp; my ($node, $edges) = split /\s*:\s*/; my @edges = split /\s*,\s*/ => $edges; foreach my $edge (@edges) { $graph {$node} {$edge} = 1 } } my @nodes = keys %graph; sub hamiltonian { my $regex = ""; foreach my $c (0 .. $#nodes) { $regex .= '(?:'; $regex .= join "|\n " => map {"(?{local \$q [$c] = \$nodes [$_]})"} 0 .. $# +nodes; $regex .= ")\n"; next unless $c; $regex .= "(?(?{"; foreach my $d (0 .. $c - 1) { $regex .= "\$q [$c] eq \$q [$d] ||\n "; } $regex .= "!\$graph {\$q [" . ($c - 1) . "]} {\$q [" . $c . "] +}"; $regex .= "})x|)\n"; } $regex .= "(?{ \@path = \@q })"; $regex; } my $regex = hamiltonian; if ("" =~ /$regex/x) { local $" = ", "; print "Found path [@path]\n"; } else { print "No path found.\n"; } __DATA__ v1: v2 v2: v1, v3, v4 v3: v2, v4 v4: v2, v3, v5 v5: v4, v6, v8 v6: v5, v7, v8 v7: v6, v8 v8: v5, v6, v7

Running this gives:

Found path [v7, v8, v6, v5, v4, v3, v2, v1]

It's straightforward to turn this into a regex that finds a hamiltonian cycle, or one that finds all hamiltonian paths/cycles.

Abigail

Comment on Finding Hamiltonian Paths using the Regexp Engine
Select or Download Code
Re: Finding Hamiltonian Paths using the Regexp Engine
by chunlou (Curate) on Aug 07, 2003 at 23:30 UTC
    Just feel like to visualize the graph.
    use strict; use warnings; use GraphViz; my $g = GraphViz->new(); for(<DATA>){ my @p = ($_ =~ /v\d/g); for(1..$#p){ $g->add_edge($p[0] => $p[$_], color => 'grey'); } } my @h = qw/v7 v8 v6 v5 v4 v3 v2 v1/; for(1..$#h){ $g->add_edge($h[$_-1] => $h[$_], color => 'red'); } $g->as_jpeg("graphm.jpg"); __DATA__ v1: v2 v2: v1, v3, v4 v3: v2, v4 v4: v2, v3, v5 v5: v4, v6, v8 v6: v5, v7, v8 v7: v6, v8 v8: v5, v6, v7
Re: Finding Hamiltonian Paths using the Regexp Engine
by BrowserUk (Pope) on Aug 08, 2003 at 08:45 UTC

    Bravo! Abigail-II++

    Now the question is: "How does it work?". This is my (possibly buggy) exploration of the technique.

    First I printed out the regex that is generated.

    Which is a doosey, but upon closer inspection reveals that there are a good deal of repeating elements. If you can understand one of these groups, then the overall picture falls into place.

    The eye first picks out this as the first repeating group

    (?:(?{local $q [0] = $nodes [0]})| (?{local $q [0] = $nodes [1]})| (?{local $q [0] = $nodes [2]})| (?{local $q [0] = $nodes [3]})| (?{local $q [0] = $nodes [4]})| (?{local $q [0] = $nodes [5]})| (?{local $q [0] = $nodes [6]})| (?{local $q [0] = $nodes [7]})) (?:(?{local $q [1] = $nodes [0]})| (?{local $q [1] = $nodes [1]})| (?{local $q [1] = $nodes [2]})| (?{local $q [1] = $nodes [3]})| (?{local $q [1] = $nodes [4]})| (?{local $q [1] = $nodes [5]})| (?{local $q [1] = $nodes [6]})| (?{local $q [1] = $nodes [7]})) (?(?{$q [1] eq $q [0] || !$graph {$q [0]} {$q [1]}})x|)

    and this as the second

    (?:(?{local $q [2] = $nodes [0]})| (?{local $q [2] = $nodes [1]})| (?{local $q [2] = $nodes [2]})| (?{local $q [2] = $nodes [3]})| (?{local $q [2] = $nodes [4]})| (?{local $q [2] = $nodes [5]})| (?{local $q [2] = $nodes [6]})| (?{local $q [2] = $nodes [7]})) (?(?{$q [2] eq $q [0] || $q [2] eq $q [1] || !$graph {$q [1]} {$q [2]}})x|)

    But on closer inspection, you'll notice that the first of these is actually 2 groups, it just happens that the second element of the first group is missing. This is clearly shown by laying out the first three groups side-by-side. (Though it isn't so clear if your terminal is restricted to 70 chars as will probably be the case when this is displayed in a post :()

    (?:(?{local $q [0] = $nodes [0]})| (?:(?{local $q [1] = $nodes [0]}) +| (?:(?{local $q [2] = $nodes [0]})| (?{local $q [0] = $nodes [1]})| (?{local $q [1] = $nodes [1]}) +| (?{local $q [2] = $nodes [1]})| (?{local $q [0] = $nodes [2]})| (?{local $q [1] = $nodes [2]}) +| (?{local $q [2] = $nodes [2]})| (?{local $q [0] = $nodes [3]})| (?{local $q [1] = $nodes [3]}) +| (?{local $q [2] = $nodes [3]})| (?{local $q [0] = $nodes [4]})| (?{local $q [1] = $nodes [4]}) +| (?{local $q [2] = $nodes [4]})| (?{local $q [0] = $nodes [5]})| (?{local $q [1] = $nodes [5]}) +| (?{local $q [2] = $nodes [5]})| (?{local $q [0] = $nodes [6]})| (?{local $q [1] = $nodes [6]}) +| (?{local $q [2] = $nodes [6]})| (?{local $q [0] = $nodes [7]})) (?{local $q [1] = $nodes [7]}) +) (?{local $q [2] = $nodes [7]})) (?(?{$q [1] eq $q [0] || + (?(?{$q [2] eq $q [0] || !$graph {$q [0]} {$q [1]}})x +|) $q [2] eq $q [1] || + !$graph {$q [1]} {$q [2]}})x|)

    There are 8 groups, one per node in the graph, and two elements per node, plus a final 1-line. Using group 3 as placeholder for the others and breaking out the 2 elements we have

    (?:(?{local $q [2] = $nodes [0]})| (?{local $q [2] = $nodes [1]})| (?{local $q [2] = $nodes [2]})| (?{local $q [2] = $nodes [3]})| (?{local $q [2] = $nodes [4]})| (?{local $q [2] = $nodes [5]})| (?{local $q [2] = $nodes [6]})| (?{local $q [2] = $nodes [7]})) (?(?{$q [2] eq $q [0] || $q [2] eq $q [1] || !$graph {$q [1]} {$q [2]}})x|)

    Looking at the first element we can see that it consists of a non-capturing group containing 8 expressions or'd together.

    (?: exp1 | exp2 | exp3 ... )

    And looking at each of the expressions, we find a code block (?{ ... }) and the contents of each code block follows a consistant pattern.  local $q[n] = $nodes[m] where n varies by group and m varies by expression. So each expression simply assigns the name of the next node from @nodes to a localise copy of the nth group element of @q.

    The code block construct (?{...}) is described in perlre as

    This zero-width assertion evaluate any embedded Perl code. It always succeeds, and its code is not interpolated.

    That worried me for a while. If each block always succeeded, the how does or'ing them together help allow us to iterate?

    The answer is, that that is what the regex engine does. If a regex fails somewhere to the right of an alternation grouping, it backtracks and picks the next (succeeding) alternate and then attempts to move forward again. As each of the expressions always succeeds, then each time the RE backtacks to this alternation, it will pick the sequentially next alternation for its next attempt.

    This can be viewed that within a regex, an alternation is a for loop, iterating over each of its alternations, with an implicit next for any that fail. In this case they will never fail, so they will all be chosen in turn until the overall regex succeeds.

    Moving on to the second element in each group we find (slightly reformatted as Abigail's consistant but eclectic formatting always throws me:)

    # The second element was missing from the first group. (? (?{ $q[1] eq $q[0] || !$graph{$ +q [0]}{$q [1]} } ) x | ) (? (?{ $q[2] eq $q[0] || $q[2] eq $q[1] || !$graph{$ +q [1]}{$q [2]} } ) x | ) (? (?{ $q[3] eq $q[0] || $q[3] eq $q[1] || $q[3] eq $q[2] || !$graph{$ +q [2]}{$q [3]} } ) x | )

    The pattern is clear, but what is going on. The thing to notice is that there are two (nested) expressions here.

    The outer expression is (? (inner) x | )

    Looking this up in perlre, it matches (?(condition)yes-pattern|no-pattern) which is described as

    Conditional expression. (condition) should be either an integer in parentheses (which is valid if the corresponding pair of parentheses matched), or look-ahead/look-behind/evaluate zero-width assertion.

    which is about as clear as mud:).

    I never really understood this description until now...and the understanding came slowly even with the aid of a complete worked example.

    The "condition expression" is (not unsurprisingly) the RE equivalent of perls if statement.

    if(cond) re_1 else re_2

    If the first clause (condition) is true,

    then attempt to match re_1 at this point in the string

    else, attempt to match re_2, if supplied.

    The bit that eluded me was that the overall expression succeeds if

    1. The condition is true and re_1 matches at this point.
    2. The condition is false and re_2 matches at the point.

    I certainly never understood that from the description. I also never realised the significance of the "/evaluate zero-width assertion." part of the description. This is giving a name (which I don;t recall seeing used anywhere else for the (?{...}) construct. It also gives a better understanding of the syntax diagram for the conditional expression.

    Getting back to Abigail's regex, this element (? (inner) x | ) is saying execute the code block ("evaluate zero-width assertion") and (despite the description for that construct saying they always succeed) use the result of that evaluation to choose which of the following regular expressions to try and match at this point. In this case, Abigail uses 'x' as re_1 and '' as re_2. This is a little mysterious at first until you look back at the code where the regex is used.

    if ("" =~ /$regex/x) {

    The regex is being used on an empty string. re_1 ('x') will never succeed in this string regardless of where in the regex it is used. However, re_2 (''), will always match (in this string or any other) regardless of where it is used.

    The upshot of all of these conditions, expressions and re's is that if the code block evaluates to true, the conditional element will fail and cause backtracking in the overall re. If the code block evaluates false then the conditional expression succeeds and the overall re moves forward to the next element.

    Looking at the contents of the code blocks.

    # null. $q[1] eq $q[0] || !$graph{$q [0]}{ +$q [1]} $q[2] eq $q[0] || $q[2] eq $q[1] || !$graph{$q [1]}{ +$q [2]} $q[3] eq $q[0] || $q[3] eq $q[1] || $q[3] eq $q[2] || !$graph{$q [2]}{ +$q [3]}

    Each condition is saying

    True if the element of @q for this group is equal to that of the element of @q for any of the preceding groups OR if the node of the graph for this group doesn't share an edge with any of the preceeding groups.

    Remembering that the second element succeeds if the code block fails and vice versa, the meaning of the second element of each group boils down to

    Backtrack if we've already passed through this node, or if this node doesn;t share an edge with the preceeding node.

    Repeat the group once for each node and that is essentially that, except that the final line just copies the contents of the path found from @q to @path.

    Thanks Abigail. As always, I learned a crap load from your code.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

      I think this is a nice explaination. A bottom-up approach; would I have had time last night to write up an explaination, it would have been top-down, so it's interesting to see how someone reversed engineered the regexp.

      Here's a link for those interested in Hamiltonian paths and cycles: http://www.ing.unlp.edu.ar/cetad/mos/Hamilton.html.

      Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://282049]
Approved by Zaxo
Front-paged by MrCromeDome
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2014-09-19 09:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (133 votes), past polls