Here's a question for the Perl-guts hackers in the audience.
I'm in the process of optimizing some Perl code, and for this purpose I availed myself of Inline::C. The resulting code is indeed faster, but it uses significantly more memory than the original version. Upon further investigation I narrowed the problem to a marked difference in the sizes of array of arrays (AoAs) between those generated by Perl and those generated by the C code. The following script illustrates the problem:
################################################################
# test_aoa.pl
################################################################
use warnings FATAL => 'all';
no warnings 'once';
use strict;
use Inline 'C' => Config => OPTIMIZE => '-O2';
use Inline 'C';
use Time::HiRes 'gettimeofday';
my $START = my $ELAPSED = microseconds();
my $MAKE_AOA = !!( shift @ARGV ) ? \&make_aoa_c : \&make_aoa;
my $BASELINE = mem_size();
for ( 1..5 ) {
start();
my $table = $MAKE_AOA->( 1000, 1000 );
$ELAPSED = elapsed();
printf "%d: %d (%d us)\n", $_, mem_size() - $BASELINE, $ELAPSED;
}
sub mem_size {
chomp( my $size = `ps -o rss= -p $$` );
return $size + 0;
}
sub make_aoa {
my ( $n_rows, $n_cols ) = @_;
return [ map [ ( 'foo' ) x $n_cols ], 1..$n_rows ];
}
sub microseconds {
my ( $sec, $microsec ) = gettimeofday();
return 1E6 * $sec + $microsec;
}
sub start {
$START = microseconds();
}
sub elapsed {
return $START ? microseconds() - $START : 0;
}
__END__
__C__
/* get_mortalspace comes from "Extending and Embedding Perl"
by Jenness and Cozens, p. 242 */
static void * get_mortalspace ( size_t nbytes ) {
SV * mortal;
mortal = sv_2mortal( NEWSV(0, nbytes ) );
return (void *) SvPVX( mortal );
}
SV *make_aoa_c( int n_rows, int n_cols ) {
int i;
int n_items = n_rows * n_cols;
char *foo = "foo";
SV **table;
SV **row_ptr;
table = ( SV ** ) get_mortalspace( n_rows * sizeof *table );
row_ptr = ( SV ** ) get_mortalspace( n_items * sizeof *row_ptr );
for ( i = 0; i < n_rows; i++ ) {
int j;
SV **row = row_ptr;
for ( j = 0; j < n_cols; ++j ) {
row[ j ] = sv_2mortal( newSVpv( foo, 0 ) );
++row_ptr;
}
{
AV *av = ( AV * ) sv_2mortal( av_make( ( I32 ) n_cols, row ) );
table[ i ] = sv_2mortal( newRV( ( SV * ) av ) );
}
}
return newRV( sv_2mortal( ( SV * ) av_make( ( I32 ) n_rows, table )
+) );
}
When given a "false" argument, the script uses the pure Perl function make_aoa to generate an AoA; otherwise it uses the C function make_aoa_c.
When I execute this script, this is what I get:
% perl test_aoa.pl 0
1: 78696 (192169 us)
2: 78700 (211009 us)
3: 78700 (159965 us)
4: 78700 (160709 us)
5: 78700 (167226 us)
% perl test_aoa.pl 1
1: 125752 (356172 us)
2: 133572 (310640 us)
3: 133572 (263302 us)
4: 133572 (265807 us)
5: 133572 (267763 us)
As you can see, the AoA generated by make_aoa_c is almost twice the size than the one generated by make_aoa (1.7x the size, to be more precise).
To add insult to injury, my snazzy Inline::C code is much slower too. Fortunately, this is the case only in this little test script. In the real application I'm working on, the move to Inline::C did make a big difference. Still, I would also love to know why my C code is so much slower...
Anyway, after all these years and much reading on the subject, I remain as mystified as ever by the Perl internals, so I'm sure my code in make_aoa_c is doing something pretty clueless. Any words of wisdom would be appreciated.