Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Closures as Itterators

by blakem (Monsignor)
on Jun 30, 2001 at 11:14 UTC ( #92884=perlquestion: print w/replies, xml ) Need Help??
blakem has asked for the wisdom of the Perl Monks concerning the following question:

This is my first attempt in using a closure as an itterator. I noticed that I had several scripts with cut-n-pasted code for deciding which lines in a datafile to process. I'm trying to consolidate that logic by using an itterator/closure instead. It seems to work just fine, but since I'm breaking new ground (for me, at least) I thought I'd ask for a little peer review.
package My::Package; my $datafile = '/path/to/data.txt'; # each line in file has a numbered key ($key) followed by # some data. file is sorted based on incrementing keys sub itter_maker { # return a function that passes back # data only from lines with keys between # $min and $max my $min = shift; my $max = shift; my $done; my $fh = new IO::File "$datafile"; return( sub { return if $done; my (@data,$row,$key); while (1) { chomp($row = <$fh>) or last; #### # [SNIPPED] munge $row, set $key and @data #### next if $key < $min; $done = 1 && return if ($key > $max); last; } return ($key,\@data); }); }
And its used as such:
#!/usr/bin/perl use My::Package; # only loop through lines with "30 <= $key <= 70" my $nextrow = itter_maker(30,70); while (my ($num,$dataref) = $nextrow->()) { ### do stuff }

Replies are listed 'Best First'.
(jeffa) Re: Closures as Itterators
by jeffa (Bishop) on Jun 30, 2001 at 19:50 UTC
    Two suggestions:
    1. Get in the habit of using the 'other' notation for instantiating an object:
      my $fh = IO::File->($datafile) or die; #instead of - and you don't need the quotes either my $fh = new IO::File "$datafile";
      The reason is because is purely habit. One day you will write your own classes, and they may use inheritance, which is where the other style will bit you. Tis a shame i think, because it just _looks_ so much nicer. ;)

    2. In the while loop inside the closure, why not just use this:
      while (<$fh>) { chomp; # use $_ instead of $row # etc . . .
      Now you don't need the last, and while(1) just looks evil. >:)

    Otherwise, nice i like it. Damian Conway discusses using closures to implement iterators in his awesome OO book. You can find a snippet at (jeffa) Re: Defining Arrays.


      I've read elsewhere about subtleties with the instantiation syntax I used (I think it was on a python propaganda page) Easy enough to change.

      The loop construct, however, is a bit strained. I don't really want to clobber $_, but I do like:

      while(chomp($row = <$fh>)) {
      Much better than:
      while(1) { chomp($row = <$fh>) or last;
      I have Damians book, but only made it about half way through before losing the thread. I don't do much OO programming, so I didn't have many concrete examples to toy with while reading the book. It was very well written though, got much further than any other OO book I've tried.


        To avoid clobbering $_, it's quite easy to just do something like:

        sub thingy { # blah blah blah... local $_; while (<$fh>) { chomp; # do stuff here... } # more stuff here... }

        In fact, I do that with every package function I write, even if I don't intend on using $_ explitly, because I don't want to depend on a built-in function not clobbering $_ for me in an attempt to be helpful.

        Furthermore, the while(chomp($row=<$fh>)) syntax is very dangerous. It will miss the last line of a file, if that line doesn't end with a newline. The while() loop is dependent on the return of something from chomp, not the definedness of $row. I would definitely use something like:

        while (defined(my $row = <$fh>)) { chomp $row; # stuff here... }
Re: Closures as Itterators
by bikeNomad (Priest) on Jun 30, 2001 at 20:19 UTC
    Actually, I like the fact that you don't clobber $_ . No reason to quote "$datafile". You could stand some diagnostics on the file open. Rather than having a
    while (1) { ... next if () ... last }
    structure, you could have used redo:

    LINE: { chomp($row = <$fh>) or last LINE; redo LINE if $key < $min; $done = 1 && return if ($key > $max); }
    And you could generalize it a bit further, by making a general-purpose iterator maker that itself takes a CODE ref to call in the middle:

    sub itter_maker { # return a function that passes back data allowed b +y the given predicate subroutine my $predicate = shift; #CODE ref my $done = 0; my $fh = IO::File->new($datafile) or die "can't open $datafile: $!\n +"; return( sub { return if $done; my (@data,$row,$key); my $found = 0; while( ! $found && ! $done && defined( $row= <$fh> ) ) { chomp($row); ($found, $done) = $predicate->($row, \$key, \@data); } return ($key,\@data) if $found; return; }); }

    update: changed to redo, fixed predicate call syntax.
    update2: incorporated tye's suggestion, thanks!


      LINE: { # ... redo LINE if # ... # ... }
      you might as well drop the block and s/redo/goto/. I'm not a fanatic about "no gotos", but I do hate to see people making very silly "loops" in order write goto-ish code without having to mention the much-maligned operator.

      But the original code has a pretty silly loop as well. If you are going to use a loop, then have the body of the loop be the code that actually gets looped over. In two of three code examples, the bottom of the loop is never looped over and so it should just be taken out of the loop! (Using the conditional at the top of a "loop" to skip over the loop and a bunch of never-looped end code doesn't make the code easier to understand.)

      In bikeNomad's last example, there is a "next if" at the bottom of the loop so that should just be moved into the loop conditional (and doing so is trivial). One of the points of loops in structured programming is so that the reasons that you will stay in (or leave) the loop are put in one location so that you don't have to reanalyze the entire code of the loop in order to figure it out.

      So the heart of bikeNomad's last example can be rewritten as:

      my $done = 0; # ... sub { my( @data, $row, $key ); my $found= 0; while( ! $found && ! $done && defined( $row= <$fh> ) ) { chomp($row); ( $found, $done )= predicate( $row, \$key, \@data ); } return( $key, \@data ) if $allow; return;
      Note that you can now have predicate() return (1,1) to say that we are done after you return the current set.

      I think that one reason that people often don't do this is that they feel restricted on how much code that should put between the parens of "while()". This example didn't demonstrate the problems of a huge conditional very well, but there are a couple of tricks that can aleviate those types of problems.

      One is just to write your code like this:

      while( conditional ) { code }
      then you can have more than one line of coditionals and still have a clean look.

      Another is to use either:

      for( init; condition; cont; ) { code; }
      init; while( contition ) { code; } continue { cont; }
      and I prefer the former for simple cases (all the information about the loop is in one central place) and the latter for complex cases (the code flows in the direction that it is typed).

      Note that the "continue" block is only useful if you use next inside of "code". If properly and very sparingly used, next, redo, and last can be better than goto. But they can easily become worse than goto because at least goto make it pretty obvious where you are going to. With the other three, you have to parse the surrounding code looking for enclosing blocks, find the top (and sometimes even the bottom) of each block to determine if it is a "loop" block or not, etc. until you find the enclosing loop and then (except in the case of redo) find the end of that loop (to check for a continue, in the case of next). I know because I've seen horrible spaghetti code that never mentioned goto.

      For a bigger challenge, let's redo the original code:

      my $done= 0; # ... my( @data, $row, $key ); return if $done || ! defined( $row= <$fh> ); do { chomp( $row ); # [SNIPPED] munge $row, set $key and @data } while( $key < $min && defined( $row= <$fh> ) ); if( $max < $key ) { $done = 1; return; } return( $key, \@data ); }
      and you see I've duplicated defined( $row= <$fh> ). In general, duplicating code is something to avoid. But such a small bit of code is worth duplicating for the much greater clarity of the loop. If the code was more complicated than that, it probably deserves to be rolled up into an interator of its own: $fh->readline(\$row).

      So we end up choosing between the following evils:

      • Duplicating code
      • Duplicating tests
      • Putting code inside tests
      • Putting conditional branching in the middle of loops
      and if you can't eliminate most of these, it usually means you need to move some code into a separate subroutine.

              - tye (but my friends call me "Tye")

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://92884]
Approved by root
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (8)
As of 2017-03-25 18:13 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (312 votes). Check out past polls.