Re: Re: Re: Confirming what we already knew

My best guess based on your breif description is that when you split your float data into the array, the C-version is performing the ascii-to-binary conversion once and the array iterated over is float *; or double * and all subsequent process of these numbers id performed on them in their binary form.

In the Perl version, the numbers are being maintained in the array in their ascii form, and each time any calculation with or comparison of them is done, perl has to perform the conversion to binary (though this step can be skipped if this scalar was use for numeric purposes previously I think as perl keeps a copy of the binary representation once a scalar has been used for math), do the math and then convert back to ascii representation and store.

The upshot is that every calculation between 2 floats, (ex. num1 *= num2; in the C version comes down to

load reg a, num1;  move 8 bytes from memory (probably cache)
; maybe 2 or 4 clock cycles barring stalls which given the 
;C array is probably contigious memory probably happens every 128K val
+ues 
;after the array is initially addressed depending on the size of the L
+1/L2 caches 
;and what else the surrounding code is doing

load reg b, num2; ditto

fmul reg a, regb; A Floating Point processor instruction
; Depending on the processor could be 1 to 10 or maybe 20 clock cycles

store reg a, num1; 8 bytes stored.
; Another 2/4 clock cycles.
[download]

maybe 30/40 clock cycles at most, and usually much less.

Whereas for Perl, the equivalent processor instructions involve

locating the base SV,
indexing that to find the XVPLV.
Locating, and possible performing ascii-binary conversion on the index var,
then using the value to calculate the offset into the storage pointed at by the XVPLV to
locate the SV pointing to the float element.
Loading the base of that SV,
checking to see if the NOK flag is set.
If it is load the previous binary value of this scalar.
If not, chase the XPVNV to the ascii representation.
Read the string byte by byte in a loop and perform the math to convert it to binary form.
Repeat from step 1 for the second variable.
Finally we have the two floats in binary form, but in (temporary?) storage not registers.
Perform the actual math using essentially the same steps as outlined above for the C version.
Now peform the reverse of the first 10 steps, twice, remembering we must always do the binary-to-ascii convertion on store as the next use of the variable may be as a string (eg. print), but that we also store both binary reps (which may save considerable time if the next use is numeric).
Do lots of flag checking/setting and other housekeeping.

I make no claims for the above being complete, correct or accurate in anyway whatsoever, but it gives a feel for whats involved

As you can see, the process of adding two floats held in an array in perl is considerably more involved than in C and takes in the order of 100's if not low 1000's of clock cycles, as opposed to 10's in C. That's the price you pay for Perl's infinitely simpler and more flexible type-polymorphism, DWIMery and transparent housekeeping.

It makes me wonder if Perl 6 couldn't reverse the magic used by Perl 5 as far as numbers are concerned. That is to say, once a value has been used as a number, if the ascii representation of that var couldn't be flagged as 'dirty' and not maintained until such times as an attempt is made to use it's contents in a non-numeric capacity. Ie. As well as the IOK and NOK flags, have a AOK flag, allowing the ascii representation not to be updated until necessary?

That said, in the above sequence, it's the pointer chasing and flag waving required by perls infinitly flexible array representation that makes the biggest difference beween it and the C-float array process, so it probably wouldn't make a whole heap of difference.

The upshot is, that if you want to do math intensive programming, use a language that lends itself to that, or find a module that allows you to hand over the storage and processesing of numeric data to a library written in such a language.

It would be nice to thing that our favorite language could be extended in its upcoming 'new' form to allow us to gives clues when we declare vars as to their most prevelent usage and that it could transparently (and without too much overhead) allocate, store and manipulate them ways more efficient to that usage, whilst retaining the DWIMery and flexibility of the standard Perl data representations. (You have no idea how hard it was to write that sentance without mentioning types :). I'm sort of thinking about the perl 6 attributes here.

Examine what is said, not who speaks.

1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.

Comment on Re: Re: Re: Confirming what we already knew Select or Download Code

Replies are listed 'Best First'.
Re: Re: Re: Re: Confirming what we already knew by TimToady (Parson) on Mar 06, 2003 at 18:06 UTC
Perl 6 will certainly support type declarations in spots that will help the optimizer. In particular, you'll be able to say something like `my num @array;` [download] to get a compact array of doubles. The very-soon-to-be-released A6 will have a section on the new type system. And then you won't have to be afraid to utter the word "type". `:-)`	[reply] [d/l] [select]
Re: Re: Re: Re: Re: Confirming what we already knew by BrowserUk (Patriarch) on Mar 06, 2003 at 18:17 UTC
Thanks for the info, it sounds really interesting. is there currently any shortlist of proposed types? Or do we must wait for the official skinny? Examine what is said, not who speaks. 1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong. 2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible 3) Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke.	[reply]
Re^4: Confirming what we already knew by diotalevi (Canon) on Mar 05, 2003 at 15:22 UTC
Are you sure about that last bit where once NOK is set it also reifies the string value as well? That sounds mighty suspicious to me (not that I actually /know/ one way or another). `perl -MDevel::Peek -e '$k = "1234.1234"; $k++; Dump $k'` shows that while there is a string still stored its not kept up to date with the numeric version unless I insert something that stringifies like print $k. Once NOK is set then it isn't going to be unset unless its string representation is altered so it keeps the packed representation around. Seeking Green geeks in Minnesota	[reply] [d/l]
Re: Re^4: Confirming what we already knew by BrowserUk (Patriarch) on Mar 05, 2003 at 16:02 UTC
Um, no. I am not sure at all, hence the disclaimer. I should have known that my 'reverse the magic', AOK optimisation idea was so obvious that it would already have have been implemented long ago. I did attempt to verify that bit, but I won't explain (my stupidity) that made me think I had confirmed it. Suffice to say, even without the need to re-ascii-ify the numbers between math ops, just the process of addressing the values held in Perl arrays is costly. This is the cost of the infinite flexibility with which they can be grown, shrunk, used to hold any type (small t) of data, sliced, spliced, diced; created and thrown away with relative impunity. For most applications, this flexibility far outweights the costs, but understanding the costs in the key to knowing when they are inappropriate for a given purpose. Math intensive manipulations of large numerical datasets is one such case. As C arrays are simply contiguous lumps of memory, looping over numeric arrays in C, involves incrementing a pointer by a fixed integer value, and loading a one or two (64- or 32-bit) registers via the pointer, performing the fp-op and then writing the result back via the same pointer. A very tight, register bound process. Even if two arrays are being processed in parralel, it's still quite likely that all the imtermediate terms and pointers can be kept in registers on most processors. The equivalent process using Perl's arrays is considerably more involved, requiring lots of pointer chasing, flag waving and (potentially) exspensive format-conversion. This is neither a surprise, nor a burden in the majority of programs, but knowing that this is what is involved is key to understanding what things perl is really good at doing and what things it is less-than-optimal for. Examine what is said, not who speaks. 1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong. 2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible 3) Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke.	[reply]
Re^4: Confirming what we already knew (it's cached) by tye (Sage) on Mar 05, 2003 at 17:37 UTC
No, when you use a Perl scalar that contains (just) a string in a numeric context, the string is converted to a number and the numeric value is cached in the scalar along with the string value. Future uses of that scalar in a numeric context simply fetch the cached numeric value. - tye	[reply]
Re: Re^4: Confirming what we already knew (it's cached) by BrowserUk (Patriarch) on Mar 05, 2003 at 19:55 UTC
Yes. I understood, (and indicated this above?) that when a scalar (that has previously been used in a numeric context) is fetched, the binary version is accessed and used saving the need for a ascii-to-binary conversion. The statement that diotalevi rightly took me to task for--though I have again attempted to verify this without success--is the one were I suggested that when a scalar that has been used in a numeric context and therefore has a binary version available, is modified numerically, the ascii version is also updated. Part of what made me think this was the case is that I cannot see any mechanism in the data structures whereby perl would be able to know whether the ascii version needed updating from the binary version. To clarify (my own thoughts mostly), in the following situation `my $num = '5.1'; ## scalar is a string, NOK is false if ($num == 5.1) { print 'It is'; }` [download] At this point, $num has been used in a numeric context, so a ascii-to-binary conversion has been done, NOK is true, and subsequent references to $num in numeric contexts can re-use the binary value directly avoiding an ftoi(). `$num++; ## binary value fetched, incremented and stored $num++; ## binary value fetched, incremented and stored` [download] The question is, did the stringy version of the value get updated when those modification occurred, or does that get delayed until $num is used in a string context? If the stringy version was not updated, then by what mechanism does perl know that it must do so when, sometime later, I do `print $num;`? Examine what is said, not who speaks. 1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong. 2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible 3) Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke.	[reply] [d/l] [select]
Re^6: Confirming what we already knew (it's cached) by Elian (Parson) on Mar 05, 2003 at 19:59 UTC
Assigning to a non-tied variable throws out everything that was in the variable, including its flags, and replaces it with the value being assigned. So in the `$foo++` case, $foo's old value gets thrown out, along with all its flags, and gets a new value and flags to go with it. If $foo used to be a string and num, well, the string part's been tossed out (if it was a numeric increment) or the num part's been tossed (if it was a string increment) Things wouldn't work too well if it were otherwise.	[reply] [d/l]
Re: Re: Re^4: Confirming what we already knew (it's cached) by hv (Prior) on Mar 05, 2003 at 21:07 UTC
When you modify the numeric value (NV) of a scalar, the NOK flag is set and the other flags (IOK, POK) are cleared: that means if we need the NV, we can just grab it (since NOK is true), but if we need the string value (PV, for 'pointer') we need to do a conversion. When the conversion is done, we also need to distinguish whether the conversion was lossy or not: for example if we use a numeric 1.2 in an integer context, we cache the integer value 1 and set pIOK ('private' integer OK) to avoid the need to recalculate the integer next time we need it, but if it were 1.0 we'd also set IOK. The Devel::Peek module is very useful for seeing this sort of thing: `perl -MDevel::Peek -we '$a=1.2;Dump($a);$b="$a";Dump($a)'` [download] Hugo	[reply] [d/l]
Re: Re: Re: Re^4: Confirming what we already knew (it's cached) by BrowserUk (Patriarch) on Mar 05, 2003 at 21:32 UTC
Re: Re: Re: Re: Re^4: Confirming what we already knew (it's cached) by hv (Prior) on Mar 05, 2003 at 21:39 UTC
Some notes below your chosen depth have not been shown here
Re^6: Confirming what we already knew (oops) by tye (Sage) on Mar 06, 2003 at 07:10 UTC
Sorry, you did in fact say that. I probably should have just waited for someone else to respond. I didn't take the time to correctly understand your point. I was correct in understanding that you thought more conversions were taking place than really do, but that was about it. But that was just sloppy. Again, sorry. - tye	[reply]


Perl-Sensitive Sunglasses
	PerlMonks