Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I saw the implementation of Set:Light on CPAN:

Set::Light

I think this is a neat idea to save memory. I was wondering if I could do the same thing by simply using exists/undef. To test membership:
if (exists $Set{'Barney'}) { ... }
To set membership:
$Set{'Barney'} = undef;

To remove an item, the delete command must be used.

I imagine that all undef values are the same, so no extra memory is wasted. Could someone shed some light on this analysis?

Replies are listed 'Best First'.
Re: Fast Sets of Scalars in Perl
by kyle (Abbot) on Jan 07, 2008 at 15:50 UTC

    No, not all undef elements are the same:

    my %set = ( a => undef, b => undef ); $set{c} = undef; foreach my $element ( keys %set ) { printf "element '$element' at %s\n", \$set{$element}; } __END__ element 'c' at SCALAR(0x504ee0) element 'a' at SCALAR(0x504290) element 'b' at SCALAR(0x504410)
      Aren't you printing three separate references to the same value, rather than three separate values? That's what your backslash is introducing. The hex number is (related to) the address of the reference created by the backslash notation, not the address of the actual thing at the end of the reference.

      I was under the same impression that Perl has one instance of undef, and all SVs that refer to undef refer to it. It is singleton. There's no way to form an undef SV other than sv_undef, that I know of. I think there is also a shareable instance of 1 and 0 (sv_yes and sv_no), but the use of these is not as cut and dried. Check perlguts for more information.

      --
      [ e d @ h a l l e y . c c ]

        I stand corrected.

        my %set; my $how_many = 10_000_000; for( my $i = 0; $i < $how_many; $i++ ) { $set{$i} = undef; } print 'mem usage: ', my_mem(), "\n"; sub my_mem { my ($proc_info) = grep { $_->[2] == $$ } map { [ split ] } `ps l | tail -n +2` +; return $proc_info->[6]; } __END__ mem usage: 1323312

        Changing $set{$i}=undef to $set{$i}=1:

        mem usage: 1402268

        Using undef instead of 1, you save about 78M on ten million items—about a 6% difference. Or you could look at it as 8 bytes per item, unless I did my math wrong (which becomes more and more probable as time t approaches lunch).

        When $set{$i}=10 (no chance of using sv_yes):

        mem usage: 1402268

        When $set{$i}='' (empty string):

        mem usage: 1872484

        And finally, Set::Light:

        use Set::Light; my $set = Set::Light->new(); my $how_many = 10_000_000; for( my $i = 0; $i < $how_many; $i++ ) { $set->insert( $i ); } __END__ mem usage: 1127960

        It beats them all! It beats the undef case by 195M. Note, however, that most of my tests ran in 20–25 seconds. The Set::Light test took much much longer—almost four minutes. I'm pretty sure all that time is spent in destruction, because the test reports its results fairly quickly and then takes a long time to exit.