http://www.perlmonks.org?node_id=692364

jryan has asked for the wisdom of the Perl Monks concerning the following question:

So, I've got this recursive subroutine, follow_paths, that walks a tree:
sub follow_paths { my ($x_s,$y_s,$s_s,$paths,$regs) = @_; my $branches = $paths->[$x_s][$y_s][$s_s]; unless (defined $branches) { # pad endpoint (base case)} return [[[$x_s,$y_s,$s_s,'pad']]]; } if (exists $regs->{$x_s}{$y_s}{$s_s}) { # register endpoint (base case) return [[[$x_s,$y_s,$s_s,'reg']]]; } my @current; foreach my $branch (@$branches) { my $branch_paths = follow_paths(@$branch,$paths,$regs); push @current, map { unshift @$_, [$x_s,$y_s,$s_s]; $_ } @$branch_paths; } return \@current; }
$paths is the tree structure we're walking; key in the coordinates for a node ($x,$y,$s) and you'll get a list of other nodes that the current node links to. $paths is pulled out of a file generated by another program. The whole point of follow_paths is to produced a list of linearized paths from the source (the initial $x,$y,$s) to the sinks (either a pad or a register). This linearized interface is needed by another part of this program that does a bunch of analysis on the paths. For instance, if a relevant subset of our paths and regs looked like this:
regs = 1,0,0 paths = 0,0,0 => 3,0,0 0,1,0 => 0,2,0 1,0,0 => 0,3,0 3,0,0 => 0,1,0 1,0,0
then the result of follow_paths(0,0,0,$paths,$regs) should produce:
[[3,0,0] [0,1,0] [0,2,0,'pad'] ], [[3,0,0] [1,0,0,'reg'] ]
And you know what? It does do that. It does it perfectly. Except when the paths are so deep that it hits deep recursion and then runs out of memory. :( My first thought to fix this was to apply MJD's general recursion-unrolling technique from his book Higher Order Perl. I came up with this:
sub follow_paths_unrolled { my ($x_s,$y_s,$s_s,$paths,$regs) = @_; my $cur_x = $x_s; my $cur_y = $y_s; my $cur_s = $s_s; my $index = 0; my $current = []; my @STACK; my $RETURN; my $BRANCH = 0; NEWCALL: while(1) { my $branches = $paths->[$cur_x][$cur_y][$cur_s]; if (not defined $branches) { # pad endpoint (base case)} $RETURN = [[[$x_s,$y_s,$s_s,'pad']]]; } elsif (exists $regs->{$x_s}{$y_s}{$s_s}) { # register endpoint (base case) $RETURN = [[[$x_s,$y_s,$s_s,'reg']]]; } else { for (my $i=$index; $i < @$branches; $i++) { if ($BRANCH == 0) { my $branch = $branches->[$i]; # push @STACK $BRANCH = 1; push @STACK, [$cur_x,$cur_y,$cur_s,$i,$current,$BR +ANCH]; # set new values ($cur_x,$cur_y,$cur_s) = @$branch; $index = 0; $current = []; $BRANCH = 0; next NEWCALL; } else { my $branch_paths = $RETURN; push @$current, map { unshift @$_, [$cur_x,$cur_y, +$cur_s]; $_ } @$branch_paths; $BRANCH = 0; } } $RETURN = $current; } return $RETURN unless @STACK; ($cur_x,$cur_y,$cur_s,$index,$current,$BRANCH) = @{pop @STACK} +; } }
That does indeed eliminate the warnings for deep recursion, but obviously (in hindsight) does nothing to fix the out-of-memory error. So, what I think I need now is some sort of way to apply another concept from his book, infinite streams, to my tree-walking problem. I suppose that it should have two parts: A stream that would return an iterator to each path. I have not, unfortunately, a fucking clue on how to go about doing this. Have any of y'all tried this sort of thing before? Edit: Added readmore code.

Replies are listed 'Best First'.
Re: Eliminating Recursive Tree Walking by using MJD-style Infinite Streams?
by dragonchild (Archbishop) on Jun 16, 2008 at 21:08 UTC
    Without reading too much, I'll point out that I have working iterator code for Tree-walking in PRE_ORDER, POST_ORDER, and LEVEL_ORDER in Tree::Fast. Feel free to crib as much as you want.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      I'm sure your code is much faster than mine, but speed really isn't an issue here, its the memory used by generating all these paths...
        My code is also a stateful iterator that is very conscious of RAM usage. This is why I pointed it to you. The name Tree::Fast is more that it's the leanest implementation of a tree in Perl that I could find.

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Eliminating Recursive Tree Walking by using MJD-style Infinite Streams?
by ikegami (Patriarch) on Jun 16, 2008 at 21:28 UTC

    Depending on your data, you might save a lot of memory by changing from

    $paths->[0][0][0] = [ [ 3, 0, 0 ], ]; $paths->[0][1][0] = [ [ 0, 2, 0 ], ]; ...
    to
    $paths->{0,0,0} = [ join($;, 3,0,0), ]; $paths->{0,1,0} = [ join($;, 0,2,0), ]; ...

    For starters, you have way fewer values, and hashes are better for sparse data. You could even pack the data down if you know the range of the numbers.

    sub to_key { my ($x, $y, $s) = @_; return pack 'L', ($x-MIN_X) * (MAX_Y-MIN_Y) * (MAX_S-MIN_S) + ($y-MIN_Y) * (MAX_S-MIN_S) + ($s-MIN_S) } sub fr_key { my $key = unpack 'L', $_[0]; return ( int( $key / (MAX_Y-MIN_Y) / (MAX_S-MIN_S) ) + MI +N_X, int( $key / (MAX_S-MIN_S) ) % (MAX_Y-MIN_Y) + MI +N_Y, int( $key ) % (MAX_S-MIN_S) + MI +N_S, ); } $paths->{to_key(0,0,0)} = join '', to_key(3,0,0), ); $paths->{to_key(0,1,0)} = join '', to_key(0,2,0), ); ...

    fr_key( substr($branches, $i*4, 4) ) would be used instead of @{ $branches->[$i] }.


    Aside from that, moving to an iterator (as you suggested) would also help. Your current function generates *all* paths before returning. Instead, it could return a path as soon as it finds one.


    But before you do anything, are you sure the deep recursion is simply deep (not a problem, except you might run out of memory) and not infinite?

      Its certainly not infinite recursion. Its just deep. Which is fine, I don't care about warnings. On some cases it hits deep recursion and ends up resolving just fine. Its just that on other inputs it first hits deep recursion and THEN hits "Out of memory!".
Re: Eliminating Recursive Tree Walking by using MJD-style Infinite Streams?
by moritz (Cardinal) on Jun 16, 2008 at 21:01 UTC
    Turning that into an iterator sounds like an interesting task, but it's hard to do without having code that actually runs. So your best bet for getting a complete solution is to provide a complete script that constructs such a tree structure, runs the sub against it, and compares the output to the desired result, preferably using Test::More or something along these lines.

    I will be offline for a few days, if nobody else answers in the meantime I'll do it ;-)

    If you just want to return some results from a sub and then continue running that sub, you can take a look at Coro, which allows you to do just that (or, to advertise my own modules, Perl6::GatherTake, which is just a nice wrapper around some Coro functions).

      Hmmm, thats fair. Give me a minute and I'll attach some code and some test files. Is there any easy way to attach a zip file here?

        Here is some code I wrote back in 2001 that takes advantage of the fact that we can push onto an array we are currently iterating over. So it's called an infinite stream eh, cute name. Anyway this happily recurses a directory (or any other) tree.

        my $root = 'c:/somedir/'; my @dirs = ($root); my @files; for my $path (@dirs){ opendir ( DIR, $path ) or next; # skip dirs we can't read while (my $file = readdir DIR) { next if $file eq '.' or $file eq '..';# skip dot files next if -l $path.$file; # skip sym links if ( -d $path.$file ) { push @dirs, $path.$file.'/'; # add dir to list } else { push @files, $path.$file; # add file to list } } closedir DIR; }
        Is there any easy way to attach a zip file here?

        No, you have to upload it somewhere else and link it here then.

        But first check if ikegami's solution works for you - if yes, no need to duplicate the effort.

Re: Eliminating Recursive Tree Walking by using MJD-style Infinite Streams?
by Limbic~Region (Chancellor) on Jun 16, 2008 at 21:21 UTC
    jryan,
    I am at YAPC::NA 2008 and admittedly have only skim read your code. I have a couple of depth first search (DFS) iterators at:

    It doesn't matter if you are letting perl manage the call stack (recursive function) or you are managing your own queue/stack (@work) - you will run into memory problems if you have to put more items on the stack then you take off. If this is the root of your problem (and I haven't read far enough to know for sure), then you will need to take a different approach.

    Cheers - L~R

      Thats definitely the root of my problem. I think your approach to an iterator might help though, I think I just need to figure-out how to integrate it with my code. One question before I try it out: what is more efficient memory-wise about your second approach than your first approach?
        jryan,
        Again, I am at the conference so I don't have the time to give you a proper response. The second one, which is more memory efficient, only requires as many nodes on the stack as the max depth of the tree. It does this by creating an iterator (one item) for the stack instead of a list of nodes (many) to visit. You then work through that iterator before moving on. This is a poor explanation - I know, but I won't be available to explain in detail until Thursday.

        Cheers - L~R

Re: Eliminating Recursive Tree Walking by using MJD-style Infinite Streams?
by ikegami (Patriarch) on Jun 16, 2008 at 23:51 UTC

    I made the iterator version.

    I changed the names of just about everything so I could keep things straight in my head. Feel free to change them back.

    I changed the param order too. I tend to put the structure on which I am operating on the left, like $self is. It was also more convenient internally. Feel free to change that back too.

    use strict; use warnings; sub build_and_pop { my ($stack) = @_; my $path = [ map $_->[0], @$stack ]; pop @$stack; return $path; } sub find_paths { my ($tree, $regs) = splice(@_, 0, 2); my @stack = ( [ [ @_ ], 0 ] ); return sub { for (;;) { return if !@stack; our ($node, $idx); local (*node, *idx) = map \$_, @{ $stack[-1] }; # $node can be a reference into $tree, # so don't modify @$node. my ($x_s, $y_s, $s_s) = @$node; my $children = $tree->[$x_s][$y_s][$s_s]; if (!$idx) { if (!defined $children) { # pad endpoint (base case)} $node = [ $x_s, $y_s, $s_s, 'pad' ]; return build_and_pop(\@stack); } if (exists $regs->{$x_s}{$y_s}{$s_s}) { # register endpoint (base case) $node = [ $x_s, $y_s, $s_s, 'reg' ]; return build_and_pop(\@stack); } } if ($idx > $#$children) { pop @stack; } else { my $child = $children->[$idx++]; push @stack, [ $child, 0 ]; } } }; }
    my $iter = find_paths($tree, $regs, 3,0,0); while (my ($route) = $iter->()) { for my $node (@$route) { print(join(',', @$node), "\n"); } print("\n"); }
Re: Eliminating Recursive Tree Walking by using MJD-style Infinite Streams?
by jethro (Monsignor) on Jun 16, 2008 at 22:06 UTC
    Are you sure that your recursion gets so deep that you run out of stack space. How many paths are in your database? Since you seem to do depth first search and if your tree isn't degenerated into one long path then you would need gazillions of paths to exceed the stack space

    But since you collect all found paths in memory (in @current), that array will get big pretty fast. So instead you should drop @current and print or save to file your found paths on success (inside your unless (defined $branches) ... and if (exists $regs... control structures. For this you need another parameter to your follow_paths subroutine that gives the momentary path to this point to the subroutine

    So keep your recursive routine, it is much more readable, just save the results, if you can't keep them in memory.

    But there is another possibility: Are you sure your data hasn't any loops, i.e. 0,1,0 points to 3,0,0 which points to 0,1,0 and both points are not in regs ? If there are loops you might put the points of your momentary path into a hash and call foul when you are at a point that already is in that hash.