http://www.perlmonks.org?node_id=479807

Dear Perl's-internals-hacking monks,

I've implemented a Hash_Iterator class, as replacement for CORE::each. The reasons for this are 1) to enable multiple independent lexically-scoped iterators on the same hash; 2) to be able to apply to such iterators various functions that take iterators as arguments (one aspect of this being to extend the hash iterator's interface to cover the four iterator functions (start, exhausted, next, and value) described by Dominus in HOP); 3) as a beginner's exercise with Inline::C; and, last but not least, 4) to increase my Perl Crackpot Index.

One could have implemented this class by simply using keys to generate an array of all the hashkeys, and have the iterator object keep track of a position in this array. Such implementation, however, duplicates the storage that already exists for the hash. One of the features that makes iterators in general attractive is precisely that they can take up significantly less memory than the corresponding array. Iterators can be composed into more complex iterative structures, but such schemes can quickly become prohibitive unless the iterators are sufficiently lightweight. Therefore an array-based implementation of hash iterators would not be very powerful.

Be that as it may, it's done and I would very much appreciate your comments (I give the code below).

Given that I have not found this sort of iterator already, the main concern I have is that, despite appearances (i.e. the seemingly functioning module presented below) the problem may be much harder than I realize. Therefore, my top question is: is what I'm trying to do feasible in Perl (without going to heroic lengths)? Or are there fundamental reasons for why anything like this would fail in Perl5?

The basic idea of my implementation is that the iterator object is also a hash, except that its internal bucket array pointer points to the bucket array of the hash that's being iterated over. The underlying C struct has its own independent iteration-oriented fields (e.g. riter and eiter) that are unaffected by those of other iterator instances on the same hash, or even by using CORE::each on this hash.

For my implementation I cannibalized sizeable chunks of hv.c and the pp_each section of pp.c. Unfortunately my understanding of the code that I hacked is less than complete. Therefore, even though my implementation passes the toughest test suite I could come up with, I still feel rather uncertain about the soundness of the C code, and I would particularly welcome your comments and criticisms of it.

I realize that this is a fair bit of code to read. I hope that some of you may be willing to take a look, or perhaps advise me on how to get knowledgeable feedback on a longish piece of code like this.

Many thanks in advance.

SYNOPSIS use Hash_Iterator; my $it = Hash_Iterator->new( \%some_hash ); while ( defined( my $k = $it->each ) ) { # do something with $k } # $it->each restarts after hitting the end while ( my ( $k, $v ) = $it->each ) { # one more iteration } # alternatively, $it->start resets the iterator for ( $it->start; !$it->exhausted; $it->next ) { my $k = $it->value; # do something with $k } # here's another way to traverse the hash $it->start; until ( $it->exhausted ) { my ( $k, $v ) = $it->next; # do something with ( $k, $v ) } DESCRIPTION The API is as follows: new( $hashref ) The constructor, obviously. The argument is mandatory, and must be a hashref. $it->each Should behave exactly like CORE::each. $it->start Resets the iterator. (It also returns the first value. This is as described in HOP.) $it->value Returns the current value, according to the same rules as CORE::each (i.e. both key and value are returned in list context); does not advance the iterator. $it->next Advances the iterator. (It also returns the results of $it->value *before* advancing the iterator, as described in HOP.) $it->nextval An alias for $it->next(). $it->exhausted Returns true iff the iterator is exhausted. NB: The only difference between the next() (or nextval()) and each() methods is that the latter (but not the former) automatically resets the iterator after hitting the end once.

use strict; use warnings; package Hash_Iterator; use Carp (); BEGIN { *_croak = \&Carp::croak; } my %Hashes; *nextval = \&next; !! 'keep require happy'; sub new { my $class = shift; _croak "Bad arguments to constructor" unless @_ == 1 and ref $_[ 0 ] eq 'HASH'; my $hash = shift; my $self = bless +{}, $class; $Hashes{ $self } = $hash; $self->start; return $self; } sub start { my $self = shift; _start_iter( $self, $Hashes{ $self } ); return $self->value; } sub next { my $self = shift; if ( wantarray ) { my @ret = $self->value; $self->_advance; return @ret; } else { my $ret = $self->value; $self->_advance; return $ret; } } sub each { my $self = shift; if ( wantarray ) { my @ret = $self->value; $self->exhausted ? $self->start : $self->_advance; return @ret; } else { my $ret = $self->value; $self->exhausted ? $self->start : $self->_advance; return $ret; } } sub DESTROY { my $self = shift; delete $Hashes{ $self }; _DESTROY_iter( $self ); return; } sub _advance { my $self = shift; _synch_and_advance( $self, $Hashes{ $self } ) unless $self->exhausted; return; } use Inline C => <<EOC; void _start_iter( HV *hv, HV *ohv ) { register XPVHV* xhv; if ( SvMAGICAL( ( SV * ) ohv ) ) Perl_croak( "Not implemented for tied/magical hashes." ); xhv = ( XPVHV* ) SvANY ( hv ); xhv->xhv_riter = -1; xhv->xhv_eiter = NULL; _synch_and_advance( hv, ohv ); return; } I32 exhausted( HV *hv ) { register XPVHV* xhv; if ( !hv ) Perl_croak( "Bad hash" ); xhv = ( XPVHV* ) SvANY ( hv ); return xhv->xhv_riter < 0; } void _synch_and_advance( HV *hv, HV *ohv ) { register XPVHV* xhv; register XPVHV* oxhv; if ( !hv || !ohv ) Perl_croak( "Bad hash" ); xhv = ( XPVHV* ) SvANY ( hv ); oxhv = ( XPVHV* ) SvANY ( ohv ); __synch( xhv, oxhv ); __advance( xhv ); return; } void __advance( register XPVHV* xhv ) { register HE *entry; if ( !xhv->xhv_array ) { xhv->xhv_riter = -1; xhv->xhv_eiter = NULL; return; } entry = xhv->xhv_eiter; /* At start of hash, entry is NULL. */ if ( entry ) { entry = HeNEXT( entry ); while ( entry && HeVAL( entry ) == &PL_sv_placeholder ) entry = HeNEXT( entry ); } while ( !entry ) { xhv->xhv_riter++; if ( xhv->xhv_riter > ( I32 ) xhv->xhv_max ) { xhv->xhv_riter = -1; break; } entry = ( ( HE** ) xhv->xhv_array )[ xhv->xhv_riter ]; while ( entry && HeVAL( entry ) == &PL_sv_placeholder ) entry = HeNEXT( entry ); } xhv->xhv_eiter = entry; return; } void value( HV *hv ) { Inline_Stack_Vars; register XPVHV* xhv; HE *entry; I32 gimme = GIMME_V; I32 n_items = 1; xhv = ( XPVHV* ) SvANY( hv ); entry = xhv->xhv_eiter; if ( gimme == G_VOID || !entry && gimme == G_ARRAY ) Inline_Stack_Void; /* returns (nothing) */ Inline_Stack_Reset; if ( entry ) { Inline_Stack_Push( hv_iterkeysv( entry ) ); if(gimme == G_ARRAY) { Inline_Stack_Push( hv_iterval( hv, entry ) ); ++n_items; } } else { /* scalar context, no entry */ Inline_Stack_Push( sv_2mortal( &PL_sv_undef ) ); } Inline_Stack_Done; Inline_Stack_Return(n_items); } void _DESTROY_iter( HV *hv ) { register XPVHV* xhv; if ( !hv ) return; xhv = ( XPVHV* ) SvANY ( hv ); if ( xhv->xhv_name ) { if ( PL_stashcache ) hv_delete( PL_stashcache, xhv->xhv_name, strlen( xhv->xhv_name ) +, G_DISCARD ); Safefree( xhv->xhv_name ); xhv->xhv_name = 0; } xhv->xhv_max = 7; /* (it's a normal hash) */ xhv->xhv_array = 0; xhv->xhv_placeholders = 0; } void __synch( register XPVHV* xhv, register XPVHV* oxhv ) { xhv->xhv_max = oxhv->xhv_max; xhv->xhv_fill = oxhv->xhv_fill; xhv->xhv_keys = oxhv->xhv_keys; xhv->xhv_array = ( HE ** ) oxhv->xhv_array; } EOC __END__

Update: Minor, mostly cosmetic, edits to the code.

the lowliest monk