Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

An iterator for (not "iterating") a recursive data structure.

by BrowserUk (Pope)
on Feb 12, 2015 at 13:20 UTC ( #1116495=perlquestion: print w/replies, xml ) Need Help??

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

If you have a recursive data structure -- say an AoA[oA[oA]...]] -- and you want to iterate over the leaves; then the typical Perl solution is a recursive callback iterator something like this:

#! perl -slw use strict; use Data::Dump qw[ pp ]; sub genNested { my $n = shift; map{ rand() < 0.5 ? [ rand(1000), genNested( $n-1 ) ] : rand( 1000 + ) } 1 .. $n; } sub cbIterator (&$) { my( $code, $tree ) = @_; for my $i ( 0 .. $#$tree ) { if( ref( $tree->[ $i ] ) eq 'ARRAY' ) { &cbIterator( $code, $tree->[ $i ] ); } else { local $_ = $tree->[ $i ]; $code->(); } } } my @nested = genNested( 3 ); pp \@nested; cbIterator{ print; } \@nested; __END__ C:\test\nhash>nestedIterator.pl [ [ "711.822509765625", "334.869384765625", ["942.19970703125", ["600.4638671875"]], ], "978.3935546875", "965.240478515625", ] 711.822509765625 334.869384765625 942.19970703125 600.4638671875 978.3935546875 965.240478515625

Which is fine unless you need to be able to quit the iteration at an arbitrary point. (ie. short circuit rather completing the full iteration).

What I need is an iterative iterator for a recursive data structure. What I've got so far is this:

#! perl -slw use strict; use Data::Dump qw[ pp ]; sub genNested { my $n = shift; map{ rand() < 0.5 ? [ rand(1000), genNested( $n-1 ) ] : rand( 1000 + ) } 1 .. $n; } sub genIterator { my $tree = shift; my @stack = [ $tree, 0 ]; return sub { ## Fill in the blank (please :). } } my @nested = genNested( 3 ); pp \@nested; my $iter = genIterator( \@nested ); while( my $next = $iter->() ) { last if $next == $something; ## do stuff with $next. }

But my mind's gone blank. Any offers for how to fill the blank?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Replies are listed 'Best First'.
Re: An iterator for (not "iterating") a recursive data structure.
by MidLifeXis (Monsignor) on Feb 12, 2015 at 14:00 UTC

    What about something like (untested, assumption that false, and not undef is a flag for end of data, and that the tree is well-formed (no hashref, for example)):

    sub genIterator { my $tree = shift; my @stack = ( [ 0, $tree ], 0 ); return sub { my $work = shift( @stack ); while ( ref( $work ) ) { my $index = $work->[0]; my $worktree = $work->[1]; if ( $index >= scalar( @{$worktree} ) ) { # Update 3 $work = shift( @stack ); next; } elsif ( ! ref( $worktree->[$index] ) ) { unshift( @stack, [ $index+1, $worktree ] ); return $worktree->[$index]; } else { unshift( @stack, [ $index+1, $worktree ] ); $work = [ 0, $worktree->[$index] ]; next; } } return; # empty stack } }

    Test Code:

    my @data = [ [ 0 ], 1, 2, [ 3, 4, 5 ], 6, [ 7, 8, [ 9, 10 ], [ 11, [ 1 +2 ], 13 ], ], 14 ]; my $it = genIterator( \@data ); use DDP; p @data; while (defined( my $x = $it->() )) { say $x; } __END__ [ [0] [ [0] [ [0] 0 ], [1] 1, [2] 2, [3] [ [0] 3, [1] 4, [2] 5 ], [4] 6, [5] [ [0] 7, [1] 8, [2] [ [0] 9, [1] 10 ], [3] [ [0] 11, [1] [ [0] 12 ], [2] 13 ] ], [6] 14 ] ] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

    Update: No longer need to assume that '0' is an invalid element, but the use of the iterator needs to be adjusted.

    Update 2: Compared to Eily's solution, mine is iterative (non recursive), while Eily's is recursive.

    UPdate 3: changed '>' to '>=' after testing, added test code.

    --MidLifeXis

      Note that you can drop the ", 0", the "ref" and both of the "next;"s.

      Since you just need pairs, you can even drop the "["s and "]"s, reducing the code to:

      sub genIterator { my( $root ) = @_; my @stack = ( $root, 0 ); return sub { while( @stack ) { my $index = pop @stack; my $tree = pop @stack; if( ref( $tree->[$index] ) ) { push( @stack, $tree, $index+1, $tree->[$index], 0 ); } elsif( $index < @$tree ) { push( @stack, $tree, $index+1 ); return $tree->[$index]; } } return; # empty stack } }

      (Not some fundamental change, just a minor reworking.)

      - tye        

        We have a winner! Thanks to you and MidLifeXis.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        It's identical to what you'd use for a breadth-first search (as an iterator or otherwise) except then you'd use a queue instead of a stack for a breadth-first search. In both cases, you need to save the search state in a local var rather than using recursion to save it on the system stack.

      Update 2: Compared to Eily's solution, mine is iterative (non recursive), while Eily's is recursive.
      Yup, and my solution generates a lots of closures on the fly. This would probably make it slower on structures that are deeper than they are wide.

      Your solution doesn't seem to work though, it goes down the levels alright, but never climbs back up. I'm afraid trying to work on this in an iterative way (I kind of see the idea, and I know it's possible) just confuses the perl out of me, so I can't point out what's wrong.

        See update three - had a bad comparitor that was generating undef in the middle of the data. I was going one beyond the end of the array of data.

        --MidLifeXis

      This is the one I'm using, though I've gone for tye's terser version. Many thanks.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      Some remarks/meditations:

      Solving the initial problem

      Should be noted that the initial problem of the OP (short-circuiting a call-back) is easily solved with goto

      cbIterator { goto STOP if $_ >700; print; } \@nested; STOP:

      Terminology: What's an "Iterator"?

      Than there is some terminology confusion, in my books ( IIRC including Higher Order Perl ) this "pass a call-back" approach isn't an iterator but something I'd maybe call a diver or walker ¹, which does the iteration inside out by processing a call-back.

      I.o.W for me in Perl speak:

      iterator := iterator function

      I checked Iterator on WP and of course there is still confusion, but at least someone proposed Iteratee

      Iteratee, in which, instead of the developer calling the iterator repeatedly to get new values, the iteratee is called repeatedly to process new chunks of data - an example of inversion of control

      Where your solution excels

      The real problem with iterators in Perl is how to continue them after the first stop. (The OP didn't ask this)

      Thats why Perl6 introduces gather/take and Python has "generators" with yield constructs.

      Since you're flattening the recursion into a loop with its own call @stack , this can easily be used for such a "coroutine".

      The trick is to store persisting data like @stack in the surrounding closure of the generator!

      Finally

      Hope this was of interest. :)

      Cheers Rolf

      PS: Je suis Charlie!

      ¹) I'd love to be pointed to some sources clarifying the terminology.

        Should be noted that the initial problem of the OP (short-circuiting a call-back) is easily solved with goto

        Not if you wish to have the option of resuming the iteration; which I do. (Admittedly this was not mentioned in the OP.)

        In addition, this suffers from all the same problems as throwing an exception to effect the early exit, which is (ab)using the exception mechanism for flow control.

        Another reason for wanting an iterator is the ability to have multiple, concurrent, independent iterators; which is key to my application.

        Last but not least is the avoidance of the nightmare that is Inversion of Control.

        Think about how you'd have to restructure so many of your programs -- and how hard, if not impossible that would be -- if Perl's hashes only provided callback iterators. What a mess!


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

        If I were to have implemented this in code, I would have probably wrapped this into something like Iterator::Simple, which wraps the 'iteratee' into an object, and adds syntactic sugar that sets it up to be accessed with ->next(), <$it>, and other sugar and coercions that help to gloss over some of the details. I believe that the object created by the module would fit the definition of the iterator. I do agree that the function I provided should have probably been referred to in my test code as $cb or some such.

        --MidLifeXis

Re: An iterator for (not "iterating") a recursive data structure.
by Eily (Monsignor) on Feb 12, 2015 at 13:43 UTC

    This is pretty rough, and only works if undef is not a possible value, and empty arrays are not possible. But you should have no problem to adapt it:

    #! perl use v5.14;; use Data::Dump qw[ pp ]; sub genNested { my $n = shift; map{ rand() < 0.5 ? [ rand(1000), genNested( $n-1 ) ] : rand( 1000 + ) } 1 .. $n; } my @nested = genNested 3; pp \@nested; my @flat = 1..10; sub genIterator(+) { my $array = shift; my $index = 0; # Current index in the list my $child; # Iterator for the level below if the current element is + an array return sub { my $element; if (defined $child) { $element = $child->(); } if (!defined $element) { $child = undef; $element = $index < @$array ? $array->[$index++] : undef; if (ref $element eq 'ARRAY') { $child = genIterator($element); $element = $child->(); } } return $element; } } my $next = genIterator @nested; say while $_ = $next->();

      > and only works if undef is not a possible value

      If you are suffering from the undef is a value problem¹, than call iterators in list context. Re: list context (iterators).

      I have to admit that I didn't try to understand your code , though! :)

      Cheers Rolf

      PS: Je suis Charlie!

      ¹) aka "0 but true" nonsense

        Though it kind of bothered me at first, now I really like the way python handles the semipredicate problem, which is to throw an exception at the end of the iteration rather than try to return a specific value (which may end up as part of an iterable). Besides, I think that python's yield would have been one of the easiest ways to solve this problem. For those who don't know about yield it stops the execution of a function (it can be in the middle of an infinite loop) which will be resumed when the next value is fetched. Perl 6's lazy lists can be used in a similar way according to its faq

        As for understanding my code, the concept is actually fairly simple (well, to me it is, but I'm used to working with closures). I started with a flat list iterator: a closure that holds the array and the current index. In the nested list version, each flat list iterator has a children which is an iterator for the current element if the element itself is an array. You could see it as equivalent to this:

        def next: if array[index].isAnArray() and array[index].hasNext() ret = array[index].next(); else ret = array[index] endif index++ end
        . This means that the calling stack for a deeply nested list will be something like:
        parent.next() child.next() grandChild.next() ...
        And so each time a value is fetched, which is why an iterative solution like MidLifeXis's can be far more effective.

      This [...] only works if undef is not a possible value, and empty arrays are not possible.

      It doesn't even handle false values (without a minor change).

      The following handles false values (including undef) as well as empty arrays:

      sub genIterator { my $array = shift; my $index = 0; # Current index into $array my $child_iter; # Iterator for array at $array[$index] return sub { while (1) { # If we're iterating through descendants of a child, if (defined $child_iter) { # If the there is a descendant we haven't returned, if ( my ($element) = $child->() ) { return $element; } # No more descendants for this child. $child_iter = undef; } # Signal end of iteration if done. return if $index >= @$array; # Return next child if it's not an array ref. my $element = $array->[$index++]; return $element if ref $element ne 'ARRAY'; # Prepare to iterate through descendants of child. $child_iter = genIterator($element); } } } my $iter = genIterator \@nested; say while ($_) = $iter->();

        ++. It looks like I forgot to test my code for false values (it took me a while to see where they wouldn't work, it was the very last line where I iterate while the value is true).

        Thanks for the correction, for some reason I'm still not sure I would have thought of it even nowadays. I suppose I don't really think about using the number of returned values when I'm expecting a scalar, rather than a list

      This is akin to the route I was going, but I just couldn't get it right. Thanks.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re: An iterator for (not "iterating") a recursive data structure.
by salva (Abbot) on Feb 12, 2015 at 14:14 UTC
    A stack of indexes?

    In order to avoid browsing the data structure from the root every time, you may want to keep a stack of pairs (reference, index) instead.

Re: An iterator for (not "iterating") a recursive data structure.
by karlgoethebier (Monsignor) on Feb 12, 2015 at 15:27 UTC
    "...AoA[oA[oA]...]]...iterate over the leaves...quit the iteration at an arbitrary point"

    Perhaps Data::Rmap does the job?

    use strict; use warnings; use Data::Rmap qw(rmap); my @data = [ [0], 1, 2, [ 3, 4, 5 ], 6, [ 7, 8, [ 9, 10 ], [ 11, [12] +, 13 ], ], 14 ]; my $something = 5; eval { rmap { $_ <= $something ? print : die; } @data; }; __END__ monks>rmap.pl 0 1 2 3 4 5

    Edit: Forgot to copy&paste something ;-) ...and fixed some entities

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Re: An iterator for (not "iterating") a recursive data structure.
by Anonymous Monk on Feb 12, 2015 at 14:05 UTC
    Which is fine unless you need to be able to quit the iteration at an arbitrary point. (ie. short circuit rather completing the full iteration).
    Well in functional languages a simple way to do it would be to throw an exception when want to exit recursion (maybe not in weird languages like Haskell). So maybe consider just dying inside the callback when you're done with it, and catch with eval.
Re: An iterator for (not "iterating") a recursive data structure.
by LanX (Archbishop) on Jul 13, 2019 at 18:49 UTC
    I found all solutions so far overly complicated, what am I missing?

    please note that the maximal length of the queue is only the sum of children at the current descend path, not all tree elements. (e.g. a binary tree with 10 levels has 2**10 = 2014 nodes but the queue will never exceed 20 = 10*2 elements)

    Changing from unshift to push will perform a breadth first search.

    use strict; use warnings; use Data::Dump qw/pp dd/; print "\n---------- Input\n"; sub genNested { my $n = shift; map{ rand() < 0.5 ? [ int rand(20), genNested( $n-1 ) ] : int rand +( 20 ) } 1 ..$n; } my @nested = genNested( 3 ); print pp \@nested; print "\n\n---------- my solution\n"; sub genIterator { my $tree = shift; my @queue = @{$tree}; return sub { while ( my ($node) = splice @queue,0,1 ) { # shift is buggy if ( ref $node eq 'ARRAY' ) { unshift @queue, @{$node}; next; } else { return $node } } return; }; } my $iter = genIterator( \@nested ); while ( my ($next) = $iter->() ) { print "$next,\t"; } print "\n\n---------- BUKs goal\n"; sub cbIterator (&$) { my( $code, $tree ) = @_; for my $i ( 0 .. $#$tree ) { if( ref( $tree->[ $i ] ) eq 'ARRAY' ) { &cbIterator( $code, $tree->[ $i ] ); } else { local $_ = $tree->[ $i ]; $code->(); } } } cbIterator { print "$_,\t"; } \@nested;

    ---------- Input [ [934, [993, 105], [167, [213]]], [71, [120, [973]], 135], [404, [828, [571]], 945], ] ---------- my solution 934, 993, 105, 167, 213, 71, 120, 973, 135, + 404, 828, 571, 945, ---------- BUKs goal 934, 993, 105, 167, 213, 71, 120, 973, 135, + 404, 828, 571, 945,

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

    update

    fixed issue with strange shift behaviour.

    update

    changed from shift to splice to handle bug with false nodes

Re: An iterator for (not "iterating") a recursive data structure.
by Anonymous Monk on Feb 14, 2015 at 22:51 UTC

    There is the simple "flatten as you go" approach:

    sub genIterator { my @stack = @_; return sub { splice @stack, 0, 1, @{$stack[0]} while 'ARRAY' eq ref $stack[0]; shift @stack; } }

      S'fine for small AoAoA..s, but the top level array in my application will contain upto 200 million elements -- circa. 7GB -- duplicating that on a stack would blow my memory.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        > duplicating that on a stack would blow my memory.

        It's not duplicating the memory, just caching the current path from root to leave, very similar to my solution here

        Worst case scenario with a balanced two level "tree" would mean approx @stack == 2*sqrt(@nodes) .

        For 200 million elements that's about 28000 elements.

        If that's still to many store a lazy iterator in the @stack returning the children of one node (like a function ref or an object or a tied array) and process it in the iterator.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1116495]
Approved by marto
Front-paged by MidLifeXis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2020-05-31 10:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (173 votes). Check out past polls.

    Notices?