Re^5: Perl is dying (better sorting than the ST)
by Jenda (Abbot) on Jul 16, 2006 at 09:02 UTC
|
I see. Good one. I've seen many attempts to replace ST by something, this is the first time is's not a hard to learn module :-)
| [reply] [Watch: Dir/Any] |
|
Yeah, those would be attempts to formalise the GRT. That’s worthwhile to attempt to put into a module – GRT transforms can be extremely cryptic. A hard to learn module is still a better alternative to that, trust me. :)
Makeshifts last the longest.
| [reply] [Watch: Dir/Any] |
Re^5: Perl is dying (better sorting than the ST)
by demerphq (Chancellor) on Feb 10, 2007 at 17:46 UTC
|
I'm wondering if being concerned about the 100 byte overhead is a form of premature optimisation. Hypothetically perl should be able to determine if the arrays are going to have their size changed, or the arrays will be preserved or whatever, and if possible use a something more efficient.
Actually here is where Perl5 could steal a trick from Python and add a Tuple data type, that is an array like data type that is of constant size and contains constant values. Having such a data type would make it possible to write ST's and be MUCH more efficient, since the constantness of the structure would allow the structure to be represented in a much more efficient way. Much of the size issues of perls data structures are due to its dynamic nature, and the fact that we optimise for speed and not space.
---
$world=~s/war/peace/g
| [reply] [Watch: Dir/Any] |
|
How can perl know the extent of of the arrays? I can see from looking at a sample optree that the size of the input array can be known at compile time. I can also see sometimes sort's function can be known to be "simple" and won't increase the extent of the array. Lastly, I can also see that the expression used by the outer, unpacking map is simple. If I thought this were an interesting thing to chase down I'd probably go write some prolog to match stuff in our optree and if we were lisp instead of perl I'd write a macro to automatically optimize the code. We're not and we won't get that goodness til Perl 6 so it'd be a waste to do this to Perl 5.
perl -MO=Concise -e 'print map $_->[1],sort{$a->[3]<=>$b->[3]}map[lc,$
+_],1,2,3'
-e syntax OK
r <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 2 -e:1) v ->3
q <@> print vK ->r
3 <0> pushmark s ->4
k <|> mapwhile(other->l)[t8] lK/1 ->q
j <@> mapstart lK/2 ->k
4 <0> pushmark s ->5
- <1> null lK/1 ->5
;; You can see the ->[1] access and you already know there's only two
+elements.
p <2> aelem sK/2 ->k
n <1> rv2av[t1] sKR/1 ->o
m <1> rv2sv sKM/DREFAV,1 ->n
l <$> gv(*_) s ->m
o <$> const(IV 1) s ->p
;; You can see the entire sort block got optimized into something nati
+ve (I guess. It's certainly not in perl anymore)
i <@> sort lKMS* ->j
5 <0> pushmark s ->6
- <1> null sK/1 ->6
- <@> scope sK ->(end)
- <0> ex-nextstate v ->-
- <2> ncmp[t4] sK/2 ->-
- <2> aelem sK/2 ->-
- <1> rv2av[t2] sKR/1 ->-
- <1> rv2sv sKM/DREFAV,1 ->-
- <$> gv(*a) s ->-
- <$> const(IV 3) s ->-
- <2> aelem sK/2 ->-
- <1> rv2av[t3] sKR/1 ->-
- <1> rv2sv sKM/DREFAV,1 ->-
- <$> gv(*b) s ->-
- <$> const(IV 3) s ->-
b <|> mapwhile(other->c)[t7] lK/1 ->i
a <@> mapstart lK/2 ->b
6 <0> pushmark s ->7
- <1> null lK/1 ->7
;; You can see the array get allocated and can see it has only two ele
+ments.
h <1> srefgen sK/1 ->b
- <1> ex-list lKRM ->h
g <@> anonlist sKRM/1 ->h
c <0> pushmark s ->d
e <1> lc[t6] sK/1 ->f
- <1> ex-rv2sv sK/1 ->e
d <$> gvsv(*_) s ->e
- <1> ex-rv2sv sK/1 ->g
- <1> ex-rv2sv sK/1 ->g
f <$> gvsv(*_) s ->g
7 <$> const(IV 1) sM ->8
8 <$> const(IV 2) sM ->9
9 <$> const(IV 3) sM ->a
| [reply] [Watch: Dir/Any] [d/l] |
|
If perl knew the array was going to be thrown away at the end of the statement, and that all indexed look ups were within the bounds of the original.
Its a big if, hence the reason I mentioned Python's tuples, which could be seen from a perl perspective to be a form of compiler hinting, telling the compiler that an AV need not be constructed. You could prototype the syntax using a RO AV for instance.
---
$world=~s/war/peace/g
| [reply] [Watch: Dir/Any] |
|
|
|
Optimisation, yes; premature, how? Please show me how maintenance or verifiability suffers in any way from picking this algorithm over the ST. Tuples would help the ST, but even if its key container overhead were much smaller, it’d still be O(n), whereas it’s O(1) for this algorithm.
Makeshifts last the longest.
| [reply] [Watch: Dir/Any] |
|
Er, maybe im missing something here, how is an ST any different from what you have done in terms of big O. Leaving the sort aside since presumably it is as efficient either way, it appears to me that with an ST you have N steps to create the arrays, and N steps to unpack them, with your code you have N steps to create the key array, and N steps to map the sorted keys back to real values.
It strikes me that your routine might be more efficient because the unit cost of each step will be cheaper, but that seems to me to be a form of optimisation that could be argued should be left to the compiler.
So what am I missing?
Also, if you are going down this route you might as well go as far you can. According to benchmark the true GRT I whipped up for this is about 50% faster than your routine, and uses less memory (by two whole AV's :-).
use strict;
use warnings;
use Benchmark qw(cmpthese);
my @a_of_a = map { [($_) x 5] }
qw( a list of words to sort but the list needs to be fairl
+y
long so we will blab on a bit just because we have to
which sorta sucks because it would be more interesting
to write code or eat ice-cream );
my @grt;
my @ari;
cmpthese -2,{
grt => sub {
my
@grt = map { $a_of_a[unpack"x4N",$_] }
sort map { pack "A4N", lc $a_of_a[$_][ 4 ], $_ }
0..$#a_of_a;
},
ari => sub {
my
@ari = do {
my @key = map { lc substr $_->[ 4 ], 0, 3 } @a_of_a;
my @idx = sort { $key[ $a ] cmp $key[ $b ] } 0 .. $#key;
@a_of_a[ @idx ];
};
}
};
if (@grt) {
for (0..$#a_of_a) {
if ( $ari[$_] != $grt[$_] ) {
die "$_ is bad\n";
}
}
}
__END__
Rate ari grt
ari 3946/s -- -37%
grt 6260/s 59% --
Rate ari grt
ari 4111/s -- -33%
grt 6179/s 50% --
Rate ari grt
ari 3989/s -- -38%
grt 6393/s 60% --
Rate ari grt
ari 4049/s -- -36%
grt 6300/s 56% --
Rate ari grt
ari 3982/s -- -38%
grt 6467/s 62% --
Because the GRT avoids the substr, needs no indirection on the key which in turn means you dont need provide a custom sort function which in turn means the per cost of each comparison is quite a bit lower. AND you get the added bonus that the sort is stable, and a lower memory profile.
But it also arguable that this too is a form of premature optimisation.
Update: when i compare your approach to a ST like the following, i see no performance difference between it and your approach.
st => sub {
my
@st = map { pop @$_; $_ }
sort { $a->[-1] cmp $b->[-1] }
map { my $k = lc substr $_->[ 4 ], 0, 3; push @$_,$k;
+$_ }
@a_of_a;
}
---
$world=~s/war/peace/g
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
Re^5: Perl is dying (better sorting than the ST)
by exussum0 (Vicar) on Jan 05, 2007 at 15:12 UTC
|
| [reply] [Watch: Dir/Any] |
|
I have no idea whether this comes from APL, I’m afraid. I originally figured out this approach on my own and only afterwards learned that it’s been known in the Perl community for at least as long as the ST. (Oddly, it does not seem to get promoted much, despite its considerable advantages in the general case.) I have no idea about its history beyond that.
Makeshifts last the longest.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
|
|