Perl: the Markov chain saw PerlMonks

### comment on

 Need Help??

Some days ago, Ovid on the perl5porters mailing list asked under the subject "inline keyword?"

I know people have tried to inline Perl subs before but with little success. Instead of going down that road, have we ever considered an inline keyword where the developer says what can be inlined?

There have been only a few answers, most off the point (mine included in the latter).

I haven't considered inlining of subroutines as do blocks at all up to now, and whipped up some benchmark code to prove Ovids point.

```use Benchmark qw(cmpthese);

my \$end = 500_000;
cmpthese( -1,
{
plain => sub {
my \$total;
for(1..\$end) {
\$total += 1 / \$_;
}
\$total;
},
sub => sub {
my \$total;
for(1..\$end) {
\$total += reciprocal(\$_);
}
\$total;
},
do => sub {
my \$total;
for(1..\$end) {
\$total += do { 1 / \$_ };
}
\$total;
},
do_var => sub {
my \$total;
for(1..\$end) {
\$total += do{ my \$int = \$_; 1 / \$int };
}
\$total;
},
}
);

sub reciprocal {
my (\$int) = @_;
1 / \$int;
}
__END__
Rate    sub do_var  plain     do
sub    6.60/s     --   -49%   -72%   -72%
do_var 12.9/s    95%     --   -46%   -46%
plain  23.9/s   261%    85%     --     0%
do     23.9/s   261%    85%     0%     --

That's pretty impresssive. But the inlined subroutine consists of just one integer division.

Experimenting further, I found that adding seven reciprocals as 1 / \$_; to the plain sub set it in par with the sub subroutine. This means that the overhead of calling a subroutine against plain inlined code is that of seven integer divisions. With heavily used small subs, inlining gives the most benefit in performance; the percentage drops as those subs get more complex, but inlining can be a significant performance boost.

What do you think? Should we have an inline keyword in Perl?

Or should we delegate that to a module?

Making inline into a Perl keyword proper would mean giving it an opcode, cloning of optrees and injecting them at the places where an inlined subroutine is called. Leaving it to a module would mean source filtering.

Thinking about it, inlining subroutines isn't anything special to the language. It is just duplicating code - for some reason - all over the place, something you don't want to do statically for the sake of DRY (Don't Repeate Yourself) and avoiding a maintenance nightmare. So you want to leave that to some mechanism at compile time, let the computer do it, and don't have the results in the source code.

Since implementing inline as a keyword proper is much more difficult than whipping up a source filter, I decided to do the latter, and for being lazy.

Before you go O Noes, another source filter module o_O - that's brittle, evil eval etc think about that: the whole perl source code relies on a source code filter named C pre-processor. All perl C source code is shoven through the preprocessor prior to compilation. If preprocessing fails, there's no compilation; cpp doesn't prove whether it produces valid C code, but fails within its own rules. Validating the procuded code is a task of the compiler.

The same applies for Perl source filters. The fact that the source filter is invoked in the compile phase doesn't change that, that is just how perl works - it switches between parsing, compiling and runtime in the compile phase (think BEGIN blocks and use), so there's nothing special about it.

Source filters have their merits. For instance, IO::All is a wonderful tool in my box, and I use it where appropriate.

After that long preamble providing the rationale and defense for the perpetration, here's the module.

update: edited according to tobyinks remarks below. The match variables are no longer package globals, and overridable via import parameters.

```package Inline::Blocks;

use strict;
use warnings;
require Filter::Util::Call;

our \$VERSION = '0.01';
our \$debug = 0;

# our \$callmatch  ||= qr{inline\s+(\w+)\s*\(([^\n]*)\)};      # find i
+nline call
# our \$plainmatch ||= 'qr{\b\$sub\s*\(([^\n]*)\)}';            # find p
+lain invocation
# our \$bodymatch  ||= 'qr{^sub \$sub\s*(\{\n.+?\n\})\$}ms';     # find s
+ub body
# our \$declmatch  ||= qr{^inline\s+sub\s+(\w+)\s*(?:;|\{)}ms; # find s
+ub declaration

my \$callmatch  = qr{inline\s+(\w+)\s*\(([^\n]*)\)};      # find inline
+ call
my \$plainmatch = 'qr{\b\$sub\s*\(([^\n]*)\)}';            # find plain
+invocation
my \$bodymatch  = 'qr{^sub \$sub\s*(\{\n.+?\n\})\$}ms';     # find sub bo
+dy
my \$declmatch  = qr{^inline\s+sub\s+(\w+)\s*(?:;|\{)}ms; # find sub de
+claration

sub import {
shift if \$_[0] eq __PACKAGE__;
@_ % 2 and die "odd number of arguments passed to ".__PACKAGE__.
'->import, aborted';
my %args = @_;
my \$callmatch  = delete \$args{callmatch}  || \$callmatch;
my \$plainmatch = delete \$args{plainmatch} || \$plainmatch;
my \$bodymatch  = delete \$args{bodymatch}  || \$bodymatch;
my \$declmatch  = delete \$args{declmatch}  || \$declmatch;
my \$debug      = delete \$args{debug}      || \$debug;
%args and die "unknown import parameters found (",join(", ",keys %
+args),
") - aborted";

my \$done;
sub {
return 0 if \$done;
my \$status;
my \$data;
while ((\$status = Filter::Util::Call::filter_read()) > 0)
+{
/^__(?:END|DATA)__\r?\$/ and last;
\$data .= \$_; \$_ = '';
}
\$_ = \$data;
while (/\$declmatch/g) {
my \$match = \$&;
my \$sub   = \$1;
s/inline\s+sub/sub/ms;
my \$re = eval \$bodymatch;
my (\$text) = /\$re/;
\$text or die "Couldn't find subroutine body for sub \$s
+ub\n";
print "sub body: '\$text'\n" if \$debug;
\$text =~ /\breturn\b/
and die "return statement found in sub '\$sub'! Rea
+d the documentation.\n";
my \$plain = eval \$plainmatch;
while(/\$plain/) {
my \$match = \$&;
my \$args = \$1;
(my \$repl = \$text) =~ s/=\s*\@_/= (\$args)/;
s/\Q\$match\E/do \$repl/;
}
(my \$repl  = \$match) =~ s/\w+\s+//;
s/\$match/\$repl/;
}
while (/\$callmatch/g) {
my \$match = \$&;
my \$sub   = \$1;
my \$args  = \$2;
print "matched subcall: '\$match' sub '\$sub' args '\$arg
+s'\n" if \$debug;
my \$re = eval \$bodymatch;
my (\$text) = /\$re/;
\$text or die "Couldn't find subroutine body for sub \$s
+ub\n";
\$text =~ /\breturn\b/
and die "return statement found in sub '\$sub'! Rea
+d the documentation.\n";
print "sub body: '\$text'\n" if \$debug;
\$text =~ s/=\s*\@_/= (\$args)/;
s/\Q\$match\E/do \$text/;
}
print "=== BEGIN ===\n\$_\n=== END ===\n" if \$debug;
\$done = 1;
}
);
}
1;
__END__

Inline::Blocks - inline subroutine bodies as do { } blocks

# inline sub at marked locations

use Inline::Blocks;
sub sum_reciprocals_to {
my (\$end) = @_;
my \$total = 0;
for my \$int ( 1 .. \$end ) {
\$total += inline reciprocal(\$int);
}
return \$total;
}
sub reciprocal {
1 / \$int;
}

# inline sub at every sub call

use Inline::Blocks;
sub sum_reciprocals_to {
my (\$end) = @_;
my \$total = 0;
for my \$int ( 1 .. \$end ) {
\$total += reciprocal(\$int);
}
return \$total;
}
inline sub reciprocal {
1 / \$int;
}

# both deparse with -MO=Deparse as

use Inline::Blocks;
sub sum_reciprocals_to {
my(\$end) = @_;
my \$total = 0;
foreach my \$int (1 .. \$end) {
\$total += do {
1 / \$int
};
}
return \$total;
}
sub reciprocal {
1 / \$int;
}

# roll your own declmatch, turn on debug

use Inline::Blocks (
declmatch => qr{^metastasize\s+sub\s+(\w+)\s*(?:;|\{)}ms,
debug => 1,
);
metastasize sub capitalize_next;

This is a module for inlining subroutines as C<do> blocks for performa
+nce reasons
implemented as a source filter. It is not a fully fledged macro expans
+ion module.

This module provides a new keyword, C<inline> by default, which is use
+d to prefix
either subroutine calls or subroutine declarations/definitions.

If a subroutine declaration or definition is marked as C<inline>, all
+instances of
subroutine calls are replaced with a C<do> block containing the subrou
+tine's body.

If a subroutine isn't declared als inlined, only the calls to that sub
+ marked as
C<inline> are transformed into C<do> blocks, other instances are left
+as is.

Currently, only plain named subroutines can be inlined (but see "Overr
+iding" below).

This means that subroutines which prototypes or attributes are not sui
+table for
inlining.

Inlineable subroutines MUST NOT use C<return>, since in a C<do> block
+this would
cause a return from the inlinee, i.e. return from a sub which uses inl
+ined code.

The return value is the latest statement of the subroutine. For subs w
+ith multiple
return points, use a variable to assign it the value and arrange your
+code so that
it always reaches the last subroutine statement which contains the var
+iable.

A subroutine block is used textually, as is, so identifiers not privat
+e to the
subroutine will be those of the scope into which that block is inlined
+. Subroutines
which are closures are not suitable for inlining, e.g. this

{
my \$bottom = 7;
sub height {
my (\$rise) = @_;
\$bottom + \$rise;
}
}

will not use the value 7 as C<\$bottom>, and compilation will fail unde
+r C<use strict>
if there's no C<\$bottom> present in the scope of the inlined call.

If parameters are passed into the subroutine, those need to be assigne
+d to
variables in LIST context:

my (\$foo, \$bar) = @_;

Inlining will substitute C<@_> with the subroutine call parameters:

# before inlining
\$result = subcall(\$foo, \$bar);

# after inlining
\$result = do {
my (\$x,\$y) = (\$foo, \$bar);
... # subroutine body here
};

As is, the regular expression to handle inlined sub calls only detects
+ a single
list as parameters i.e.

\((s[^\n].*)\)

which means that a parameter list must begin and end on the same line.
+ Multiline
parameter lists are not supported (but see "Overriding Conventions" be
+low).

Any EXPR used as subroutine call parameter must be resolvable in the c
+ontext where
inlining takes place.

You may want to provide your own, more sophisticated filtering regexps
+ according
to your coding conventions. To that end, you may pass in the following
+ named
parameters along with their values to import, which will override the
+builtins:

=over 4

=item callmatch

Compiled regular expression (via C<qr> - see perlop) used to match inl
+ined
subroutine calls. Default:

qr{inline\s+(\w+)\s*\(([^\n]*)\)};

This matches the inlined sub's name and its parameter list in \$1 and \$
+2.

=item plainmatch

String containing a single C<qr> call which will be eval'ed by the fil
+ter.
It must contain the string '\$sub' which - after being eval'ed -will ho
+ld the
current subroutine's name during the filtering process. Default:

'qr{\b\$sub\s*\(([^\n]*)\)}'

This matches plain (without 'subroutine' prefix) calls to subroutines
+declared
as inline subs.

=item declmatch

Compiled regular expression (via C<qr> - see perlop) used to match sub
+routine
declarations or definitions prefixed with the C<inline> keyword. Defau
+lt:

qr{^inline\s+sub\s+(\w+)\s*(?:;|\{)}ms

=item bodymatch

String containing a single C<qr> call which will be eval'ed by the fil
+ter.
It must contain the string '\$sub' which - after being eval'ed -will ho
+ld the
current subroutine's name during the filtering process. Default:

'qr{^sub \$sub\s*(\{\n.+?\n\})\$}ms'

=back

Additionaly, you can pass in the keyword 'debug' with a true value, wh
+ich will
print diagnostics to STDERR.

Standard caveats and frowning towards source filters apply.

Keywords meaningful inside subroutines may not do what you expect - na
+mely
C<return>, C<caller> and C<wantarray>.

Since the overhead of calling a named subroutine over fully inlined co
+de (without
even a C<do> block around) is roughly that of calculating seven intege
+r reciprocals,
most performance benefits are obtained with simple and heavily used su
+broutines.

For the example in the SYNOPSIS, inlining as C<do> blocks without assi
+ngning shows
a performance boost of roughly 100% against the same code with subrout
+ine calls,
while C<do> blocks with assignment of arguments to private variables m
+easures as
a 50% increase.

Filter::Util::Call

shmem, E<lt>shmem@cpan.orgE<gt>

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.20.2 or,
at your option, any later version of Perl 5 you may have available.

=cut

Using this module, the following

```use Inline::Blocks;

inline sub capitalize_next;

print uppercaseIncrementAsString('a'..'f'), "\n";

sub uppercaseIncrementAsString {
my @l = @_;
my \$ret;
\$ret .= capitalize_next(\$_) for @l;
\$ret;
}

sub capitalize_next {
my (\$thing) = @_;
uc inline increase(\$thing);
}
sub increase {
my (\$foo) = @_;
++\$foo;
}

results (via B::Deparse) in

```use Inline::Blocks;
print uppercaseIncrementAsString(('a', 'b', 'c', 'd', 'e', 'f')), "\n"
+;
sub uppercaseIncrementAsString {
my(@l) = @_;
my \$ret;
\$ret .= do {
my(\$thing) = \$_;
uc do {
my(\$foo) = \$thing;
++\$foo
}
} foreach (@l);
\$ret;
}
sub capitalize_next {
my(\$thing) = @_;
uc do {
my(\$foo) = \$thing;
++\$foo
};
}
sub increase {
my(\$foo) = @_;
++\$foo;
}

What do you think? does that suffice or should we have an inline keyword? Apart of answers to that question, critics are welcome, improvements also, e.g. for the regexps in the regexp variables, their names etc.

perl -le'print map{pack c,(\$-++?1:13)+ord}split//,ESEL'

In reply to RFC: Inline::Blocks or inline as a keyword? by shmem

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

• Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
• Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
• Read Where should I post X? if you're not absolutely sure you're posting in the right place.
• Posts may use any of the Perl Monks Approved HTML tags:
a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
• You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
 For: Use: & & < < > > [ [ ] ]
• Link using PerlMonks shortcuts! What shortcuts can I use for linking?

Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2020-02-17 13:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
What numbers are you going to focus on primarily in 2020?

Results (71 votes). Check out past polls.

Notices?