Re: Believably slow..
by hardburn (Abbot) on Jun 22, 2003 at 13:53 UTC
|
I don't know how much this will help your specific situation, but you'll generally see an improvement if you write Perl like Perl, not as a dynamically-typed C.
The thing that immediately stands out to me is the use of the three-arg form of the for loop in Perl. This is basically there for the benifit of old C programmers. Personally, I see it used so seldom Perl that I often forget it exists.
The Perl-ism is something like this:
# Change this:
for($num=0;$num<16777216;$num++)
# To this:
for my $num (0 .. 16777216)
# And this:
for($i=1;$i<=24;$i++)
# Becomes this:
for my $i (1 .. 24)
Using more Perl-ish constructs often provides hints to the optimizer, as well as generally reducing code size without hurting maintainability.
A few other ideas that may or may not help:
- Lexical scoping. See Lexical scoping like a fox.
- Put a use int; at the top. By default, Perl will use floats in arithmetic, which will slow you down. use int; forces it to use integers.
---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
-- Schemer
Note: All code is untested, unless otherwise stated
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Switching to an iterator-style for from the C-style for, using lexicals instead of globals reduces my running time from ten minutes thirty seconds to seven minutes thirty seconds. If you weren't aware the C-style for is equivalent to the pseudo-code initializer; while( condition ) { ... ; post_loop_action } which involves more steps than a .. iterator. In my example I moved the my() parts out of the tight loop because they have runtime implications. I used lexicals over globals because its one opcode lighter (a very, very cheap opcode though). Switching to 'use integer' further reduced the runtime to nine minutes forty five seconds (did I mention I'm doing this on an otherwise unburdoned machine?) and six minutes respectively. The net effect is this particular form of "optimization" didn't buy me all that much but just shows that perl doesn't compete with simple things like that C++ which fits nicely within the processor's cache and is just a few instructions. The equivalent perl code is significantly more complex and of course takes longer. I wouldn't be surprised if most of the example's compiled C++ code was mostly register operations.
# The optimized perl code
use strict;
use warnings FATAL => 'all';
use integer;
my $ans = 0;
my $total;
my $shift;
for my $num ( 0 .. 16777216 ) {
$total = 0;
$shift = 1;
for my $i ( 1 .. 24 ) {
$total += $num & $shift
? $i
: -$i;
$shift <<= 1;
}
$ans++ if $total == 0;
}
print "$ans\n";
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
I guess you can look at it this way, your task is simular to making a round object that will roll down a hill, C lets you make just a simple wheel with no other baggage, Perl lets you make a wheel but gives you all the baggage that comes along with making the rest of a car. There is overhead to all of the memory managment, hash tables, etc that comes with even a one liner perl script -- stright C code can avoid most of this if it is not needed. On the other hand your C app as it grows into the "car" where it becomes more complecated will start to perform much more like the perl script overall.
-Waswas
| [reply] [Watch: Dir/Any] |
|
|
I've been trying your optimized code on my machine (an old Duron 700MHz, under medium load):
$ time ./bench.pl
187692
real 17m12.507s
user 10m3.640s
sys 0m1.300s
so it's still very slow compared to what it does.
Then, I've tried to compile it:
$ perlcc -O -o bench bench.pl
$ time ./bench
187692
real 12m35.665s
user 6m1.480s
sys 0m0.720s
It still is definitely slow, but you get an evident boost in
performance just by compiling it into C code... something to remember for situations in which you really need Perl for doing lengthy tasks. | [reply] [Watch: Dir/Any] |
|
|
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Not that it matters much ...
use Benchmark 'cmpthese';
#16777216
cmpthese(
9000,
#-3,
{
'c-style' => sub {
for my $num (0 .. 1000) {}
return();
},
'rangefor' => sub {
for( my $num = 0; $num < 1000; $num++) {}
return();
},
});
__END__
Benchmark: running c-style, rangefor, each for at least 3 CPU seconds.
+..
c-style: 4 wallclock secs ( 3.20 usr + 0.00 sys = 3.20 CPU) @ 64
+94.54/s (n=20802)
rangefor: 3 wallclock secs ( 3.27 usr + 0.00 sys = 3.27 CPU) @ 37
+93.02/s (n=12388)
Rate rangefor c-style
rangefor 3793/s -- -42%
c-style 6495/s 71% --
Benchmark: timing 9000 iterations of c-style, rangefor...
c-style: 1 wallclock secs ( 1.44 usr + 0.00 sys = 1.44 CPU) @ 62
+58.69/s (n=9000)
rangefor: 3 wallclock secs ( 2.39 usr + 0.00 sys = 2.39 CPU) @ 37
+65.69/s (n=9000)
Rate rangefor c-style
rangefor 3766/s -- -40%
c-style 6259/s 66% --
update: lol
MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!" | I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README). | ** The third rule of perl club is a statement of fact: pod is sexy. |
| [reply] [Watch: Dir/Any] [d/l] |
|
This benchmark would be far better if you had named the subroutines correctly.
:-) Youll notice that you have the names for the two subs reversed.
Also, I personally wouldn't do a benchmark with an empty block. Put something in it just in case Perl gets smart enough to optimize both of those subroutines down to sub{} and then maybe even just tosses any calls to them becuase they dont really do anything....
use Benchmark 'cmpthese';
#16777216
my @num=0..1000;
my ($x,$y,$z);
my $subhash={
'rangefor' => sub {
for my $num (0 .. 1000) { $x++ }
return();
},
'c-style' => sub {
for( my $num = 0; $num < 1000; $num++) { $y++ }
return();
},
'foreach' => sub {
foreach my $num (@num) { $z++ }
return();
},
};
cmpthese(-3,$subhash) for 1..3;
__END__
Benchmark: running c-style, foreach, rangefor, each for at least 3 CPU
+ secs...
c-style: 3 wall secs ( 3.25 usr + 0 sys = 3.25 CPU) @3039.64/s (n
+=9891)
foreach: 3 wall secs ( 3.30 usr + 0 sys = 3.30 CPU) @3930.39/s (n
+=12986)
rangefor: 3 wall secs ( 3.22 usr + 0 sys = 3.22 CPU) @4133.68/s (n
+=13327)
Rate c-style foreach rangefor
c-style 3040/s -- -23% -26%
foreach 3930/s 29% -- -5%
rangefor 4134/s 36% 5% --
Benchmark: running c-style, foreach, rangefor, each for at least 3 CPU
+ secs...
c-style: 3 wall secs ( 3.16 usr + 0 sys = 3.16 CPU) @3046.59/s (n
+=9612)
foreach: 3 wall secs ( 3.20 usr + 0 sys = 3.20 CPU) @3921.37/s (n
+=12568)
rangefor: 4 wall secs ( 3.18 usr + 0 sys = 3.18 CPU) @4080.31/s (n
+=12955)
Rate c-style foreach rangefor
c-style 3047/s -- -22% -25%
foreach 3921/s 29% -- -4%
rangefor 4080/s 34% 4% --
Benchmark: running c-style, foreach, rangefor, each for at least 3 CPU
+ secs...
c-style: 3 wall secs ( 3.14 usr + 0 sys = 3.14 CPU) @3037.84/s (n
+=9554)
foreach: 3 wall secs ( 3.30 usr + 0 sys = 3.30 CPU) @3941.12/s (n
+=12986)
rangefor: 3 wall secs ( 3.10 usr + 0 sys = 3.10 CPU) @4079.25/s (n
+=12662)
Rate c-style foreach rangefor
c-style 3038/s -- -23% -26%
foreach 3941/s 30% -- -3%
rangefor 4079/s 34% 4% --
---
demerphq
<Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Believably slow..
by jonadab (Parson) on Jun 23, 2003 at 23:33 UTC
|
Can someone enlighten me on the great disparity in speed?
Somewhat. Integer arithmetic is very close to being
the pathological case. There are machine instructions
expressly designed to do this stuff; very little
high-level structure is required (unless you want to
deal gracefully with adverse circumstances, such as
an overflow). As a result, low-level languages are
blazingly fast, because they cut out a lot of the
normal operational overhead, which turns out to be
the bulk of the runtime for a problem like this,
precisely because the problem itself is such a simple
one from the computer's perspective.
In case you hadn't noticed, you're looping several
hundred million times. The "slowness" you're seeing
in Perl is normal operational overhead,
and under more typical circumstances you'd never
notice it because it would get lost in the underflow.
Put something more substantial than integer arithmetic
inside that inner loop (in both C/C++ and Perl, the
same thing) and see what happens. For example, inside
the inner loop, write out a short message (maybe
including the number you're on) to a logfile.
Then compare.
{my$c;$ x=sub{++$c}}map{$ \.=$_->()}map{my$a=$_->[1];
sub{$a++ }}sort{_($a->[0 ])<=>_( $b->[0])}map{my@x=(&
$x( ),$ _) ;\ @x} split //, "rPcr t lhuJnhea eretk.as
o";print;sub _{ord(shift)*($=-++$^H)%(42-ord("\r"))};
| [reply] [Watch: Dir/Any] [d/l] |
Re: Believably slow..
by tall_man (Parson) on Jun 23, 2003 at 19:09 UTC
|
I tried PDL, but it was also very slow (about 49 minutes):
I did much better with Inline::C (about 4 seconds after the first time):
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Believably slow..
by aquarium (Curate) on Jun 22, 2003 at 23:39 UTC
|
i'm fairly confident that the reason for the slowdown is that perl is converting to-and-from binary representation (vec) format, whilst C/C++ doesn't do any checking so you can just use it as integer or binary whenever. I don't know enough about the internals. Some other monk may know how to put this in code. | [reply] [Watch: Dir/Any] |
|
i'm fairly confident that the reason for the slowdown is that perl is converting to-and-from binary representation (vec) format
No. The reason for the slow-down is mostly that Perl is dispatching a bunch of op-codes while the C++ code is compiled to native machine language.
I don't think there is any conversion going on in that loop at all. It certainly isn't converting between "(vec) format" since vec deals with strings and we aren't doing any stringification here. Note that the bit-wise operators work both on strings and (unsigned) integers, figuring out which to do based on parameters given to them.
It might be doing some conversion between integers and floating point, but I doubt it is doing much of that.
- tye
| [reply] [Watch: Dir/Any] |
|
Would this be a place to use XS? I am not an XS whiz, just making a guess, what does this code do anywhoo?
| [reply] [Watch: Dir/Any] |
|
Re: Believably slow..
by chunlou (Curate) on Jun 23, 2003 at 14:58 UTC
|
If you do a print $num & $shift ? '+' : '.' ; in the loop (following diotalevi's example), you can observer for $num (0..20) the pattern:
........................
+.......................
.+......................
++......................
..+.....................
+.+.....................
.++.....................
+++.....................
...+....................
+..+....................
.+.+....................
++.+....................
..++....................
+.++....................
.+++....................
++++....................
....+...................
+...+...................
.+..+...................
++..+...................
..+.+...................
And the middle part (8388608..8388628):
.......................+
+......................+
.+.....................+
++.....................+
..+....................+
+.+....................+
.++....................+
+++....................+
...+...................+
+..+...................+
.+.+...................+
++.+...................+
..++...................+
+.++...................+
.+++...................+
++++...................+
....+..................+
+...+..................+
.+..+..................+
++..+..................+
..+.+..................+
And the end part (16777196..16777216):
..++.+++++++++++++++++++
+.++.+++++++++++++++++++
.+++.+++++++++++++++++++
++++.+++++++++++++++++++
....++++++++++++++++++++
+...++++++++++++++++++++
.+..++++++++++++++++++++
++..++++++++++++++++++++
..+.++++++++++++++++++++
+.+.++++++++++++++++++++
.++.++++++++++++++++++++
+++.++++++++++++++++++++
...+++++++++++++++++++++
+..+++++++++++++++++++++
.+.+++++++++++++++++++++
++.+++++++++++++++++++++
..++++++++++++++++++++++
+.++++++++++++++++++++++
.+++++++++++++++++++++++
++++++++++++++++++++++++
........................
Looks like a sparse matrix we could explore. Like, if you add if ($num<$shift){$total-=(24-$i)*(25+$i)/2; last;} in the loop, it would speed up the beginning of the loop (but slow down the end). Here a sample code (mostly diotalevi's):
#!/usr/bin/perl -w
use strict;
use integer;
my $ans = 0;
for my $num (0..16777216) {
my $total = 0;
my $shift = 1;
for my $i (1..24) {
$total += $num & $shift ? $i : -$i;
if ($num<$shift){$total-=(24-$i)*(25+$i)/2; last;}
$shift <<= 1;
}
$ans++ if $total == 0;
}
print $ans;
Ironically, I mostly found out how to slow the code down rather than speeding it up.
Though the $shift <<= 1; part is essentially static and equivalent to @shift = qw/ 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 /;, using array will slow things down.
If you want something even slower, here is something I tried in Pari:
#!/usr/bin/perl -w
use Math::Pari qw/ PARI / ;
PARI 'ans=0' ;
my $cmd = qq/
for(
num = 0, 1677,
total=0; shft=1;
for(
i = 1, 24,
if(bitand(num,shft)==0, total+=-i, total+=i);
shft <<= 1;
);
if(total==0, ans++);
)
/;
$cmd =~ s/\s//g;
PARI $cmd ;
print PARI('ans');
Funny enough, it's slow as CDROM. | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Believably slow..
by pope (Friar) on Jun 24, 2003 at 11:32 UTC
|
C or C++ for this sort of computation is just too fast for Perl. Even Perl can't beat Java in this. The intensive loop in your code reminds me of "sieve of eratosthenes" benchmarks (see the Great Win32 Computer Language Shootout or the original shootout) on which Perl is far away behind Java.
I have tried a bit vector version of it, but still get bad result, so I'm sure the slowness is in the arithmetics. Monks here have played golf on eratosthenes, but I don't know about their attempst on performance hacks.
| [reply] [Watch: Dir/Any] |
Re: Believably slow..
by Aristotle (Chancellor) on Jun 30, 2003 at 22:12 UTC
|
You can do quite a lot better with a properly Perlish version.
my $ans = 0;
my $total = 0;
my $bit;
for my $num (0 .. 16777215) {
$bit = 1;
$total += $_ ? $bit++ : -$bit++
for unpack("b24", pack "L", $num) =~ /./g;
$ans++ unless $total;
$total = 0;
}
print "$ans\n";
$ make t
cc t.c -o t
$ time ./t
187692
real 0m3.992s
user 0m3.990s
sys 0m0.000s
$ time perl t.pl
187692
real 6m54.313s
user 6m53.290s
sys 0m0.510s
Still, obviously, you're never going to get anywhere near the same ballpark as C with Perl code in things as machine-friendly as this.
Makeshifts last the longest. | [reply] [Watch: Dir/Any] [d/l] |
Re: Believably slow..
by aquarium (Curate) on Jun 24, 2003 at 09:14 UTC
|
i slightly modified the c++ version to run it in c, declaring variables and changing cout to printf. it ran compiled (with TURBOC :) ) in 3 secs. i still can't believe there would be such a vast difference between perl and c++...so i fiddled a bit, doing extra printf, and found that for the c/c++ version the value of shift didn't seem to change, ie printf("%d",shift); prints 0's. this wasn't the case in the perl version. maybe i'm doing something wrong, but couldn't figure out why. maybe some other monk (with a C++ compiler -- i'm missing header files in my cygwin gcc) can check this. if my results were correct, then no wonder the c/c++ version ran so fast (turning the shift<<=1; into a no-op) | [reply] [Watch: Dir/Any] |
|
Even if it didn't turn that into a no-op, it won't make a whole lot of difference. A shift operation, even with loading and saving a variable from memory and then even with stack frame related overhead, will take maybe two dozen clock cycles on a bad day (cache misses et al) on modern CPUs. In comparison, even the simplest single operation in Perl will probably take a few hundred cycles (if not thousands).
If reports are to be believed though, Perl6 will be much faster at this, and native Parrot code should be on a par with C.
Makeshifts last the longest.
| [reply] [Watch: Dir/Any] |
Re: Believably slow..
by aquarium (Curate) on Jun 25, 2003 at 13:55 UTC
|
i found the culprit operation which costs the most time:
if($num & $shift)
i'd say it's to do with having to add so many bits to $shift once $num starts to skyrocket. It should have been implemented in binary/packed format of same size bit patterns. btw: you can reduce the use of huge numbers to start with by generating that same bit pattern (1..24), repeating it, and just adding leftmost 1 bits as required. you can also precalc the "anded" values and leave out the & altogether. and with the other fixes to make it more perlish (the for loops), the prog should run in the order of a couple of minutes max. i got 6 minutes just by taking out the "if($num & shift)" and the "else" | [reply] [Watch: Dir/Any] |
|
When you left out if($num and $shift), did you get 187692 as the computed value? That the number it should print when the computation is done.
| [reply] [Watch: Dir/Any] |