Perl: the Markov chain saw PerlMonks

### Re^4: Finding All Paths From a Graph From a Given Source and End Node

by neversaint (Deacon)
 on Nov 01, 2010 at 08:54 UTC ( #868703=note: print w/replies, xml ) Need Help??

Dear BrowserUK,
What's the time complexity of findPaths()?
Thanks so much for improving the subtroutine.

---
neversaint and everlastingly indebted.......
• Comment on Re^4: Finding All Paths From a Graph From a Given Source and End Node

Replies are listed 'Best First'.
Re^5: Finding All Paths From a Graph From a Given Source and End Node
by BrowserUk (Pope) on Nov 01, 2010 at 09:37 UTC
What's the time complexity of findPaths()?

Um... O(lots) :). Honestly, I haven't got a clue how you go about assessing that.

It will be entirely dependant upon the complexity of the graph. Not just the number of nodes, but the number of connections at each node. And I don't have the math to make that kind of assessment.

I'd say that if your graphs are big enough for you to worry about it, then you'd probably be better of looking at an iterative solution rather than a recursive. Though often, iterative solutions that just emulate the recursion through manual stack handling are no more efficient, and often much less so.

I think the main cost of my routine is the memory allocations for the results sets. If your application only need to process one results set at a time, rather than having them all available, then I'd be looking for an iterator solution.

FWIW (which is not much IMO), the literature says that breadth-first and depth-first are both O(Bd) worst case.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re^5: Finding All Paths From a Graph From a Given Source and End Node
by BrowserUk (Pope) on Nov 02, 2010 at 02:52 UTC

If you can do what you need to do with the paths by getting them one at a time, rather than all at once, then this version of findPaths() runs much faster and uses far less memory. (It runs faster because it uses much less memory.)

For example, on a randomly generated graph that has 10,000 paths, it takes 1.09 seconds instead of 38 seconds for my previous version. In the process, this uses less than 7MB where the previous version used 240MB.

And on a graph that has just over 5 million paths, this still uses just 7MB and completes in 14 1/2 minutes.

I project that the previous version would take 5 1/2 hours to complete, except that it would require 12GB to do so, and I only have 4GB.

It's still the same order of complexity--its essentially the same algorithm--but the simple expedient of avoiding allocating and reallocating zillions of small chunks of ram, mean in the real world it is much faster. Which confirms once again my feelings about big O.

Anyway, if your algorithm can be adapted to operate this way, you might find it useful.

```sub _findPaths2 {
my( \$code, \$graph, \$start, \$end, \$path, \$seen ) = @_;
return \$code->( @\$path, \$end ) if \$start eq \$end;
\$seen->{ \$start } = 1;
for ( grep !\$seen->{ \$_ }, @{ \$graph->{ \$start } } ) {
_findPaths2( \$code, \$graph, \$_, \$end, [ @\$path, \$start ], { %\$
+seen } ),
}
}
sub findPaths2(&@) { _findPaths2( @_, [], {} ); }

findPaths2{
print join ' ', @_;
} \%graph, \$start, \$end;

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Dear BrowserUK,
Thanks so much. This update is truly invaluable. May you be repaid with your generosity for helping poor guys like me. I truly owe you much (as always).

---
neversaint and everlastingly indebted.......
Did you also benchmark other code? Like in this post?

(I can't see the point in copying around the %seen hash :) ¹

Furthermore this code can be easily linearized to avoid the function-call overhead...

Anyway the benchmarks highly depend on the nature of those "randomly generated graphs".

In general there are still plenty of possible optimizations left to speed up such a search.

Cheers Rolf

UPDATE:

1) or even the current path of a DFS.

Did you also benchmark other code? Like in this post?

No.Here's the 5 million path tree, the command line and timing, but from a cursory glance at code you reference, you're going to require a lot (10GB or more) of memory.

```c:\test>868031 -S=7529 A Z
{
A => ["N", "W", "J", "L", "C", "E", "X", "T", "O", "H"],
C => ["Z", "J", "Q", "U", "S", "T", "N", "P", "D", "O"],
E => ["P", "G", "U", "F", "X", "A", "Y", "K"],
H => ["W", "O", "J"],
J => ["Z", "U", "B", "Q", "N", "I", "V", "F", "C", "P"],
K => ["X", "H", "J", "C", "P", "W", "E", "S", "Q"],
L => ["C", "B", "V", "A", "S", "J", "O", "H"],
N => ["R", "G", "K", "N", "Q", "W", "C", "U", "E", "V"],
O => ["K", "G", "X", "A", "Z", "W"],
P => ["O", "G", "F", "T", "E", "U", "L", "H", "B", "R"],
Q => ["V", "N", "X", "U", "D", "M", "S", "C", "R", "G"],
R => ["D", "S", "K", "X", "O", "U"],
S => ["Q", "E", "T", "P", "G", "Z"],
U => ["Y", "V", "U", "X", "R", "W", "M", "G", "K", "N", "A"],
W => ["P", "E", "G", "Y"],
X => ["Z", "H", "R", "L", "J", "W", "A", "E", "X", "T", "D"],
Y => ["J", "X", "G"],
Z => ["K", "A", "Z"],
}

5014604 FP2 took 981.75 secs for 5014604 paths

If you prefer the 10,000 path graph as a test:

```c:\test>868031 -S=7367 A Z
{
A => ["F", "B", "U", "Z", "J", "C", "Q", "H"],
D => ["W", "X", "F", "M", "K", "Y"],
G => ["E", "P", "U"],
H => ["K", "Q", "S", "T", "X", "G", "D", "B"],
I => ["U", "Q", "K", "D"],
K => ["O", "R", "A", "L", "X", "N", "C", "M"],
L => ["G", "I", "A", "O", "N", "J", "D", "S", "R", "V", "M"],
N => ["K", "L", "V", "Z", "U"],
O => ["W", "K", "D", "I", "A", "J", "M", "T", "Y", "Z", "P"],
P => ["B", "G"],
R => ["V", "B", "G", "P"],
T => ["M", "I", "N", "K", "D", "U", "A", "V", "W"],
U => ["V", "Z", "J", "A", "E"],
V => ["M", "G", "O", "F", "W", "Y", "P", "S"],
X => ["K", "S", "B", "P", "N", "T", "W", "Z", "H", "R", "F"],
Y => ["Q", "K", "J", "U", "G", "M", "P"],
Z => ["A", "M", "Z", "Q", "W", "N", "G", "J", "L", "H"],
}

10062   FP2 took 2.10 secs for 10062 paths
(I can't see the point in copying around the %seen hash 1) or even the current path of a DFS. :) Furthermore this code can be easily linearized to avoid the function-call overhead...In general there are still plenty of possible optimizations left to speed up such a search.

Improvements or alternatives welcome :) . I'm not claiming the fastest on the block, just better than my own previous efforts.

neversaint expressed his interest in the time complexity of my previous version, so I set out to improve it. Given the combinatorial explosion that can result from the all-paths traversal of apparently quite simple graphs, moving to an iterator rather than an accumulating generator seemed the logical route.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Create A New User
Node Status?
node history
Node Type: note [id://868703]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2020-02-21 10:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
What numbers are you going to focus on primarily in 2020?

Results (94 votes). Check out past polls.

Notices?