Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
more useful options

RFC: DBIx::Iterator

by runrig (Abbot)
on May 11, 2007 at 21:22 UTC ( #615015=perlmeditation: print w/ replies, xml ) Need Help??

After making a couple of posts on, I'm posting my code here, maybe for eventual posting to CPAN, and asking for comments (on code, module name, docs, etc.), questions, and whatever else may come. As the example (toward the bottom of the docs) indicates, this is for building a big ugly nested foreach loop for SQL statements, hopefully without quite so much bigness and ugliness. Iterator-DBI was too simple for my needs, and yet required more kerfluffery in the Iterator department (with is_exhaused methods, etc.). It would be simple to wrap an iterator created with this module into something that Iterator could use. Anyway, here's the code:

Update: One thing I will change is to default to prepare_cached(), and provide an option to fallback to prepare()
Another update: This uses fetchall_arrayref(), which suits my needs just fine. But if you were selecting millions of rows from one of these iterators (that is, one of the sub-iterators, not the master iterator created by mk_iterator), that may not be a good thing. So a possible option (which would require some work) would be to provide an option to fetch one row at a time instead of fetchall.

package DBIx::Iterator; use strict; use warnings; use base 'Exporter'; our @EXPORT_OK = qw(mk_iterator list_iterator sql_iterator); sub sql_iterator { my %cfg = @_; my ( $sql, $args, $select, $binds, $transform, $dbh, $master, $out +er, $cache) = @cfg{qw(SQL ARGS SELECT BIND TRANSFORM DBH MASTER OUT +ER CACHE)}; $args ||= []; my @arg = ( undef, @$args ); my @bind; if ( defined $binds ) { @bind = ref($binds) ? ( undef, @$binds ) : ( $binds ) x @arg; } my $key; my $master_cnt; my $nxt_master; my $rows = []; sub { until (@$rows) { if ( $master ) { $nxt_master = $master->() or return; } else { return if $master_cnt++; $nxt_master = {}; } my $sth = $dbh->prepare($sql); my $cache_hit; if ( $cache ) { no warnings 'uninitialized'; $key = join $;, @$nxt_master{@$args}; if ( exists( $cache->{$key} ) ) { $rows = [ @{$cache->{$key}} ]; $cache_hit = 1; } } unless ( $cache_hit ) { if ( defined $binds ) { $sth->bind_param($_, $nxt_master->{$arg[$_]}, $bind[$_]) for + 1..$#arg; } else { $sth->bind_param($_, $nxt_master->{$arg[$_]}) for 1..$#arg; } $sth->execute(); $rows = $sth->fetchall_arrayref(); $rows = $transform->($rows,$nxt_master,\%cfg) if $transform; $cache->{$key} = [ @$rows ] if $cache; } $rows = [ [] ] if $outer and !@$rows; } my $nxt_row = shift @$rows; @$nxt_master{@$select} = @$nxt_row; return $nxt_master; } } sub mk_iterator { my %cfg; if ( @_ and UNIVERSAL::isa($_[0], 'HASH') ) { %cfg = %{shift(@_)}; } my $f; while ( my $nxt_f = shift(@_) ) { if ( UNIVERSAL::isa($nxt_f, 'ARRAY' ) ) { my @join = $nxt_f; push @join, shift(@_) while @_ and UNIVERSAL::isa($_[0], 'ARRAY' + ); my $new_cfg = { %cfg }; $new_cfg->{MASTER} = $f; $f = join_iterators($new_cfg, @join); } else { my %override = ( @_ and UNIVERSAL::isa($_[0], 'HASH')) ? %{shift +(@_)} : (); my %new_cfg = %cfg; $new_cfg{MASTER} = $f if $f; $f = $nxt_f->(%new_cfg, %override); } } return $f; } sub join_iterators { my %cfg; if ( @_ and UNIVERSAL::isa($_[0], 'HASH') ) { %cfg = %{shift(@_)}; } my ($master, $dbh) = @cfg{qw(MASTER DBH)}; my @mk_iter = @_; my $nxt_master; my $master_cnt; my @master_list; # Transform master iterator into one which returns # a 'more' indicator, followed by the value and an # end flag for each iterator in the list. my $new_master = sub { unless ( @master_list ) { if ( $master ) { my $nxt = $master->() or return; @master_list = ( $nxt, ($nxt, undef ) x @mk_iter ); } else { return if $master_cnt++; @master_list = ( {}, ( {}, undef ) x @mk_iter ); } } my $nxt_master = shift @master_list; return unless $nxt_master; return { %$nxt_master }; }; my $new_cfg = { %cfg, MASTER => $new_master }; my @iter = map { mk_iterator( $new_cfg, @$_ ) } @mk_iter; my $idx = @mk_iter; # Exhaust every iterator before seeing if master # iterator has more values. sub { { if ( $idx >= @mk_iter ) { my $tmp = $new_master->() or return; $idx = 0; } my $nxt = $iter[$idx]->(); return $nxt if $nxt; $idx++; redo; } } } sub list_iterator { my %cfg = @_; my ( $list,$select, $transform, $master, $outer) = @cfg{qw(LIST SELECT TRANSFORM MASTER OUTER)}; my $master_cnt; my $nxt_master; my $rows = []; sub { until (@$rows) { if ( $master ) { $nxt_master = $master->() or return; } else { return if $master_cnt++; $nxt_master = {}; } $rows = [ @$list ]; $rows = $transform->($rows,$nxt_master,\%cfg) if $transform; $rows = [ [] ] if $outer and !@$rows; } my $nxt_row = shift @$rows; @$nxt_master{@$select} = @$nxt_row; return $nxt_master; } } 1; __END__ =head1 NAME DBIx::Iterator - An iterator package for SQL statements =head1 SYNOPSIS use DBIx::Iterator qw(list_iterator sql_iterator mk_iterator); sub iter_generator1 { sql_iterator( SQL => $sql_statement, %other_options, @_, ) }; sub list_iter_generator { list_iterator( LIST => [ @list_of_lists ], \%more_options, ) } my $iterator = mk_iterator( \%main_options, \&iter_generator1, \%more_options, \&iter_generator2, \&iter_generator3, etc. ); or: my $iterator = mk_iterator( \%main_options, \&iter_generator1, \%iter_generator1_arguments, [ \&iter_generator2, \&iter_generator3, etc. ], [ \&iter_generator4, \&iter_generator5, etc. ], etc. ); while ( my $row = $iterator->() ) { # $row is a hashref } =head1 DESCRIPTION This package allows you to have a nested SQL iterator without having t +o build nested for loops. Nothing is exported by default, but functions sql_iterator and mk_iterator are exported if explicitly requested. =over 4 =item mk_iterator Returns an iterator function which will return a hash reference on eve +ry call. The first argument is a hash reference, usually containing the databas +e handle, and all following arguments are individual iterator generators that optionally accept a configuration +hash. Each iterator generated is passed to the next generator in the 'MASTER' slot in the configurat +ion hash, and the database handle is passed to every iterator in the 'DBH' slot. The argument following each generator may be a hash reference, which w +ill be passed to the generator which can be used, e.g., to override the generator's def +ault configuration. =item sql_iterator Returns an iterator. Usually used inside an iterator generator functio +n. Accepts a hash, whose keys may be: SQL - The sql statement. ARGS - Array ref of hash keys corresponding to '?' placeholder argumen +ts in the sql. BIND - Array ref of DBI data types used to bind the ARGS to the sql st +atement. Or a scalar containing a single type to bind to all arguments. SELECT - Keys to add to the hashref returned from the iterator. Usuall +y corresponds to the select clause in the sql statement, but names may be different. TRANSFORM - A subroutine that takes all rows returned by the sql in an + arrayref, and returns a new array ref of rows transformed. May be used to 'flatt +en' the results of a sql statement. OUTER - Like an outer join. If no rows are returned by the sql stateme +nt, return the master row once before fetching a new master row. DBH - The database handle. Usually not required, as it is passed in by + default via mk_iterator(). MASTER - The parent iterator. Usually not required, as it is passed in + by default via mk_iterator(). CACHE - A hashref used to cache the selected data (key is the data cor +responding to the ARGS option joined by $SUBSEP). =back =head1 EXAMPLES sub get_customers { mk_sql_iterator( SQL => 'select cust_id, cust_name from customers', SELECT => [qw(CUST_ID CUSTOMER_NAME)], @_, ); } sub get_orders { mk_sql_iterator( SQL => 'select order_no, order_amt from orders where cust_id = ? +' SELECT => [qw(ORDER_NO TOTAL_AMT)], ARGS => ['CUST_ID'], @_, ); } my $f = mk_iterator( { DBH => $dbh }, \&get_customers, \&get_orders, ); while (my $order = $f->()) { # $order is hash ref with keys CUST_ID, CUSTOMER_NAME, # ORDER_NO, and TOTAL_AMT } =head1 AUTHOR runrig =head1 COPYRIGHT This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =head1 SEE ALSO L<Iterator>. =cut =cut

Comment on RFC: DBIx::Iterator
Download Code
Re: RFC: DBIx::Iterator
by runrig (Abbot) on May 12, 2007 at 00:48 UTC

    A better example may help. Here is some actual pseudo code (that is, some actual code stripped down and with names changed to protect the innocent). Notice how different iterators can share the same lookup cache per database handle, and other iterators can be conditionally injected into the stream.

    for my $dbh ( @db_handles) { my ( $crud_cache, $crud_stuff_cache, $stuff_name_cache) = ({},{},{}) +; my @filter_foo_name; if ($do_some_filtering) { @filter_foo_name = ( \&filter_foo_name, { FILTER => $this } ); } my $f = mk_iterator( { DBH => $dbh, BIND => SQL_INTEGER }, \&get_foo, { WHERE_CLAUSE => $where{$dbname} }, \&get_wibble, \&get_stuff, \&get_crud, [ \&get_more_important_stuff, \&get_foo_wibble_id, \&get_bar_wibble_id, \&get_foo_name, @filter_foo_name, \&get_bar_name, \&get_foo_wibble_thingy_id, \&get_bar_wibble_thingy_id, \&get_foo_crud_id, \&get_bar_crud_id, \&get_foo_crud_name, { CACHE => $crud_cache }, \&get_bar_crud_name, { CACHE => $crud_cache }, \&get_foo_crud_stuff, { CACHE => $crud_stuff_cache }, \&get_bar_crud_stuff, { CACHE => $crud_stuff_cache }, ], [ \&get_stuff_name, \&get_stuff_type, \&get_stuff_this_and_that, \&get_stuff_this_name, { OUTER => 1, BIND => SQL_VARCHAR, CACHE => $stuff_name_cache +}, \&get_stuff_that_name, { OUTER => 1, BIND => SQL_VARCHAR, CACHE => $stuff_name_cache +}, \&get_foo_crud_stuff, { OUTER => 1, CACHE => $crud_stuff_cache } +, \&get_bar_crud_stuff, { OUTER => 1, CACHE => $crud_stuff_cache } +, ], ); while ( my $row = $f->() ) { # do stuff w/$row which is a hashref # w/all the keys from the SELECT array refs (see # sql_iterator() example below). # # Update: Actually, since there are array refs # in the iterator constructor, the actual href here # will contain all of the keys from the first # four iterators, and for each of those values, # it will iterate through values containing # keys from the first array ref, then through # values containing keys from the second array ref print join(",", $@row{qw(FOO_WIBBLE_ID FOO_NAME BAR_WIBBLE_ID BAR_ +NAME ETC ETC)}), "\n"; } }

    And many of the iterator generator functions share the same code, with only the SELECT and ARG list differing, e.g.:

    sub get_foo_wibble_id { sql_iterator( SQL => "select WIBBLE_ID\n". "from WIBBLE_TABLE\n". "where STUFF_ID = ?\n". "and IMPORTANT_ID = ?\n", SELECT => ['FOO_WIBBLE_ID'], ARGS => [qw(STUFF_ID FOO_THINGY_ID)], @_, ) } sub get_bar_wibble_id { get_foo_wibble_id( SELECT => ['BAR_WIBBLE_ID'], ARGS => [qw(STUFF_ID BAR_THINGY_ID)], @_, ) }
    Updated: per below. probably not overly clearly, but it's something for now :-)

    More update: Since the iterator is basically the same as a DBI fetchrow_hashref(), the "# do stuff" is whatever you would do after fetching a row from a database. But instead of doing one big SQL statement that joins many tables, I have many separate iterators each selecting from a (usually) single table with the iterators joined together.

    Why do it this way? In my case, it was because I was working with a completely undocumented and confusing database schema. I was constantly commenting out some of the inner loops/joins and seeing if the outer loops were returning what I wanted. In one big SQL statement with joins, this would require commenting out columns in the SELECT clause, the tables in the FROM clause, and expressions in the WHERE clause. Using the above scheme, I could comment out just ONE LINE to eliminate an inner join...usually commenting out one line at a time from the bottom up in the mk_iterator call. Using the TRANSFORM option, I could do things that SQL alone wouldn't do, or would be hard to do. Though using this, you are limited in some ways, most notably in how useful any ORDER BY clause is. Hope that explains things :-)

      I think this example would be clearer if you gave some details about what values $row transitions through here. That is, what might the code that replaces "# do stuff" look like?

      - tye        

      Just for fun, here is a completely runnable abstract example that doesn't use SQL at all, we are instead iterating over various lists that sort of simulate selecting a single column from several joined tables (No, you probably wouldn't really want to do this sort of hard coded list iterator this way, but it would be a good way to splice in hardcoded lists as sort of a Cartesian join on some SQL when combined with the SQL iterators):
      use strict; use warnings; use DBIx::Iterator qw(mk_iterator list_iterator); my $iter = mk_iterator( \&list_123, [ \&list_abc, \&list_def, ], [ \&list_456, \&list_789, ], ); use Data::Dumper qw(Dumper); while ( my $r = $iter->() ) { print Dumper($r); } sub list_ab { my $x = shift; my $y = shift; list_iterator( LIST => [ map {[$_]} $x..$y ], SELECT => [ "$x$y" ], @_, ) } sub list_123 { list_ab(1,3,@_) } sub list_456 { list_ab(4,6,@_) } sub list_789 { list_ab(7,9,@_) } sub list_abc { list_ab("a","c",@_) } sub list_def { list_ab("d","f",@_) }
Re: RFC: DBIx::Iterator
by chromatic (Archbishop) on May 12, 2007 at 04:30 UTC

    Why UNIVERSAL::isa? Do you expect people to pass in blessed references? Do you want to forbid tied variables?

      Why UNIVERSAL::isa?
      Because nobody has suggested anything better yet or convinced me otherwise...I've seen some of the debates on UNIVERSAL::isa vs. ref() vs. Scalar::Util:reftype(), but not recently, and I don't recall what all the pros and cons are, and I'm not sure if there was any absolute best solution. Got any suggestions?

        My best suggestion is to accept only one type of argument so you don't have to perform the check at all.

        If that's not feasable, I suggest to dereference the reference appropriately inside an eval block. Otherwise you end up with a huge chain of tests just so that you don't miss all of the possibilities (non-reference, reference of the wrong type, reference of the right type, blessed reference, tied object, blessed reference of the wrong type with overloading, blessed reference of the right type with overloading).

        Philosophically speaking, using UNIVERSAL::isa() to check for structural typing in Perl is perilous because Perl uses a nominal typing system (or behavioral, but no one ever talks about that). Sometimes the only way to know if you can do a thing is to do it and see if anything breaks.

      Why UNIVERSAL::isa? Do you expect people to pass in blessed references? Do you want to forbid tied variables?

      You confuse me. Some of the advantages of isa are that it doesn't forbid blessed references and it doesn't forbid tied variables.

      One of the problems with using isa() as a function instead of as a method is that items that pretend to be some type of data structure via overloading are not detected by direct use of UNIVERSAL::isa() alone. One way to help isa() is for such classes to implement their own isa() method. Unfortunately, that work-around is often quite inconvenient for code like above to make use of (because $ref->isa("HASH") is most likely to just die, even though it works better for this one case).

      A much better (IMHO) work around would be for such an overloading class to do push @ISA, "HASH"; so that UNIVERSAL::isa( $ref, "HASH" ) would be true.

      Doing a tiny bit of testing, I see that UNIVERSAL::isa() is somewhat pickier than I assumed and to make this work one must also do something to create the "HASH" package, for example, { my $no_op= @HASH::ISA; }.

      So I consider UNIVERSAL::isa() to be a "best practice" because it gets (as near as I can tell) all but the one rarest case right, it is very simple and easy to understand (doing *isa= \&UNIVERSAL::isa; makes it even nicer), and that one rarest case is nearly trivial to make work as well.

      A patch to do this push @ISA, "HASH" and create the HASH package (etc.) when appropriate would be well worthwhile, IMHO. I apologize that I won't be attempting such a patch any time soon for a number of reasons.

      I hope I got that all correct. :)

      Update: And let me address one potential complaint that I've seen before. One other aspect of UNIVERSAL::isa() is that you can use the technique I gave above to lie, intentionally making UNIVERSAL::isa() incorrectly return a true value for your objects. This aspect does not bother me because "Doctor, it hurts when I do this", that is, if you want to go out of your way to intentionally break my simple code, I just don't care. More importantly, such an ability can prove extremely useful for some of those edge cases you run into when writing unit tests and trying to get full code coverage. So I consider this aspect to be a feature. Note that you can also (probably) cheat in the other direction as simply as doing *UNIVERSAL::isa= sub { ... };.

      I can certainly see the point in trying to prevent people from accidentally creating a class that breaks my simple code. I think it folly to try to prevent people from intentionally trying to break such code.

      So it'd be really cool if were "fixed" to co-operate with this technique. If I saw some other candidate for "best practice" here that was as close to "right" and as available yet didn't require this somewhat hackish trick, then I'd be pushing it instead. All of the alternatives have much worse draw-backs, I believe.

      - tye        

        So I consider UNIVERSAL::isa() to be a "best practice" because it gets (as near as I can tell) all but the one rarest case right...

        Unfortunately, the function use of isa() and can() has been deprecated in bleadperl for at least a year. It doesn't answer the question appropriately. It's a hack. It relies on an accidental implementation detail of how method storage works in Perl.

        In my mind, the right solution is to patch overload to answer DOES() appropriately, which has the benefits of being the right question, giving the right answer, and not breaking methods. (Life would be even more pleasant with autobox in the core at that point; then reftype() would be unnecessary before calling DOES().)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://615015]
Approved by shigetsu
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2014-04-19 18:59 GMT
Find Nodes?
    Voting Booth?

    April first is:

    Results (483 votes), past polls