Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Short Circuiting DFS Graph Traversal

by Limbic~Region (Chancellor)
on Oct 29, 2010 at 00:54 UTC ( [id://868192] : perlquestion . print w/replies, xml ) Need Help??

Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All,
In this node, neversaint asks how to find all paths between two nodes in a graph. The example graph is not weighted and since we are only interested in paths and not costs, this doesn't come into play. While the example graph is directed, I don't believe it matters.

Graph appears to have a method but it is unclear to me if it only finds all the shortest paths or all paths (I haven't tested). The straight forward approach is a depth first search (DFS). My question is if it is possible to efficiently short-circuit the search.

I have provided a short circuiting solution though I am not positive it is correct (limited test set). I also don't know if it is very efficient (optimizing for run time). The basic idea is this: At each node, a handful of conditions can apply.

  • No nodes can be reached
  • All of the reachable nodes are already part of the current path (cycle)
  • We are at our destination (end point)
  • More nodes can be reached
My short circuiting applies to the first two cases. If later on, I reach any node in this path again I can stop descending if my current path has all the same nodes as the completed path (any order) at the point the completed path is entered. I will try and provide an example of what I mean but also refer to the code.
R -> X -> T -> Y -> M -> F # Considered complete R -> I -> T -> A -> X -> Y # Candidate for pruning
Since Y is in a path considered dead, I can short circuit if my path has an R, X and T in any order. I have only tested this hypothesis on two graphs. If this is not a safe assumption, please provide an example that disproves it and also offer any suggestions for short circuiting that you know to be viable.

Now assuming you have made it this far, the problem is that Y may be part of many completed paths. In order to determine if I can short circuit, I have to go through all of them until I find a match or reach the end. Depending on a number of factors (the graph, order of traversal, end points, etc), it can be more expensive keeping track of completed paths then just enumerating all of them. The thing that bothers me is that when more than one completed path share a single non-end point node in common then if one fails they all fail. Unfortunately, I haven't figured out how to group them together this way and still loop over all of them. Can you think of a way to do this more efficiently or do you know of an entirely different approach that is more efficient?

Cheers - L~R

Replies are listed 'Best First'.
Re: Short Circuiting DFS Graph Traversal
by LanX (Saint) on Oct 29, 2010 at 10:32 UTC
    I'm not sure if I'm getting you right because your example confuses me.

    I suppose what you mean is

    A---F /|\ / 0 | C H -1 \|/ \ / B G

    For any path from 0 to C that includes A it's useless to continue to F.¹

    This situation reminds me of the Max-flow min-cut theorem.

    IMHO finding all pathes is related to maximizing the flow and the set {C} is a min-cut.

    I.a.w. any path from 0 to 1 will be a combination of paths from 0-C and C-1. ²

    I'm sure this kind of problem is already extensively investigated in literature.

    BTW my initial idea to improve my solution is to start from both ends but I'm reluctant to improve solutions for loosely expressed problems, (normally that results in launching rockets where throwing a stone is efficient and optimal).

    Cheers Rolf

    UPDATES:

    1) i.e. after 0-A-C, 0-B-A-C, 0-A-B-C don't continue to F

    2) partitioning into separated subgraphs allows a divide and conquer approach.

      LanX,
      Here is an actual example - one that I tested with:
      L F B / | \ / \ / | \ R--J--X----D--Q P M \ | / \ | | Z U--S

      Let's say I am trying to find all paths between D and M. I have intentionally made a left hand side and a right hand side so all paths going to the left of D will be invalid because they will require coming back through D to get to M. Now, let's say I first start descending from D -> X and beyond and unwind back to D in failure. Now I go to D -> F -> X. I already determined that paths to X coming from D are bad so I can stop here.

      Regarding your initial idea. That is called a bi-directional search and was my first inclination too assuming only the shortest path(s) were desired. Unfortunately, if there is a 3 node solution and a 7 node solution, neversaint wants both.

      Cheers - L~R

        > Regarding your initial idea. That is called a bi-directional search and was my first inclination too assuming only the shortest path(s) were desired. Unfortunately, if there is a 3 node solution and a 7 node solution, neversaint wants both.

        yes but IMHO it's still applicable for "all paths", you just have to test combinations of "half paths" that meet in a node.

        IMHO that should already reduce the complexity C(n) of a uni-directional approach to 2*C(n/2) of bi-directional, which means a dramatical speed up!

        > Here is an actual example - one that I tested with:

        > L F B > / | \ / \ / | \ > R--J--X----D--Q P M > \ | / \ | | > Z U--S
        The point here is that the (one element) set of edges {(D,Q)} is already a min-cut of the graph which separates start and stop.

        Sorry it's been 15 years now since I heard (the basics) of graph theory at uni but IMHO calculating the shortest path should be a good start for identifying cuts.

        Cheers Rolf

        Hmm what about a three phase approach?

        Finding "all paths" by backtracking is so expensive because of the power set of combinations of partial "subpaths"!

        That's why nobody is interested to test all combinations in a dead track.

        What about identifying the nodes/subpaths which can only be part of a solution in a first phase and later on trying to find all combinations.

        Lets say in the first phase we mark each node in your example by the minimal distance from the start. (quite a trivial iterative algorithm of low complexity (IMHO O(n)), I already realized it)

        2 1 B2 / | \ / \ / | \ 3--2--1---D0--Q1 P3 M* \ | / \ | | 2 U2-S3

        In the second phase we have to find all "short" paths by "stepping" down from M simply by always choosing a neighbour of lower degree.

        I.e. M-B-Q-D or M-S-U-Q-D.

        That means this left branch you wanted to avoid won't be visited again and will not cause any notable complexity.

        Now finding "all" paths in a third phase could be done by combining "detours" to the shortest path. e.g. M-B-(P-U)-Q-D.

        The latter is rather tricky, alternatively one could also simply try a brute force backtracker like already demonstrated, but much faster because it's restricted to step only thru nodes marked in phase 2!

        Cheers Rolf

        UPDATE: After some thoughts I have to admit that the process to identify "unreachable" parts of the graph as result of phase 2 is more complicated ... too complicated for a simple post, coz I really can't think of any reasonable application of calculating all(!) paths instead of just only the shortest.

        Limbic~Region

        > Now, let's say I first start descending from D -> X and beyond and unwind back to D in failure. Now I go to D -> F -> X. I already determined that paths to X coming from D are bad so I can stop here.

        L F B / | \ / \ / | \ R--J--X----D--Q P M \ | / \ | | Z U--S

        I haven't investigated the "short circuit" heuristics you've implemented so far, but it's certainly not trivial.

        Let the following show a search for paths from 0 to * and the numbers denote the first steps taken in a DFS.

        3 / \ 0 2 \ / \ 1---4 | *
        Now avoiding edges or nodes "failuring" after passing 2 would hinder you finding 0->3->2->4->1->* after unwinding to 0.

        IMHO you need a mechanism to detect loops ("cycles") to "earlier" nodes in your actual search path before cutting of a branch of the graph which isn't trivial for DFS, I'd rather prefer BFS.

        But this involves already more mathematical proofs than most of our fellow monks are likely to tolerate.

        Which brings me back to the suspicion that we are trying to launch a rocket, while the OP most likely needs a tutorial about how to throw stones!

        As long as he isn't capable/willing to correctly express his needs it doesn't make much sense continuing! :(

        HTH!

        Cheers Rolf

Re: Short Circuiting DFS Graph Traversal
by lidden (Curate) on Oct 29, 2010 at 08:23 UTC
    I am not sure I correctly understod the problem, but it seems Dijkstra's algorithm with all edges set to one should work.
      lidden,
      Unfortunately it won't - at least, not as I understand it. In the problem posed by neversaint, all paths between two nodes is desired. It is never said that cycle avoidance is required but I have assumed it for my purposes. In other words, each node visited along the path may only be visited once. Any algorithm designed to find the shortest or cheapest path or paths will not produce the correct result - even if every edge has the same cost. Assume the two nodes are directly connected. Imagine taking a Hamiltonian path around the graph with the start and end points being the two nodes in question - this can certainly be proven not to be among the shortest/cheapest paths.

      Cheers - L~R

        Can the edge cost be set to zero? :) Then, all the possible paths would be the cheapest.
Re: Short Circuiting DFS Graph Traversal
by LanX (Saint) on Oct 29, 2010 at 16:13 UTC
    well you could start a hackers competition providing some graphs or even a graph generator and benchmark the different solutions.

    (Saying this I really fear that I could spend time into participating... ;-)

    Cheers Rolf

Re: Short Circuiting DFS Graph Traversal
by LanX (Saint) on Oct 31, 2010 at 12:05 UTC
    A little demonstration why trying to calculate _all_ path is a dangerous thing.

    let Nx be the graph of a x dimensional cube where the edges are bidirectionally connections of 2^x nodes representing the corners.

    101--111 /: /| 001--011| |100·|110 N3 |· |/ 000--010

    When calculating paths between extremal corners you will see:

    N2 (the square) has 4 corners and 2 such paths

    N3 (the normal cube) has 8 corners and 18 paths (e.g. 000->001->011->010->110->100->101->111 )

    N4 has 16 corners and already 6432 paths

    N5 has 32 corners and the program to calculate all paths is still running and will likely never finish!

    I really doubt that such a general question is a clever approach. (for comparison the number of _shortest_ paths is "only" x! i.e. 5!=120 for N5)

    IMHO not only a clear case of an XY question but already XYZ, any underlying question is either far more trivially answered¹ or is simply not solvable.

    This approach to precalculate all path will potentially damage the hardware.

    Cheers Rolf

    ¹) e.g. can a node K be part of a path from B to E?