Bug or WAD in lvalue substr? (again.)

In the thread at Ways to delete start of string, one of the most notable things to come out of it in my eyes, is the large discrepancy in performance of substr used as an lvalue and the 4-arg substr variant. Using a modified form of the benchmarks presented there these are typical result sets:

Update: Replaced the timing with those from a better benchmark that isolates the code under test from the costs of allocating destructable test data. The effect is to highlight the inadaquacy of the earlier benchmark attempts, most notable by the fact that the reverse-chop-reverse method is now shown to be the slowest (as one would intuatively expect.

These changes do not affect the meat of this meditation, as the lvalue results are still far slower than the 4-arg results.

[14:28:20.34] c:\test>\Perl510\bin\perl5.10.0.exe 688308.pl -loops=1e2
              Rate     reverse substr_copy       subst substr_lval  su
+bstr_mod
reverse     5.79/s          --        -78%        -96%        -97%    
+    -98%
substr_copy 26.0/s        349%          --        -84%        -85%    
+    -92%
subst        162/s       2692%        522%          --         -5%    
+    -48%
substr_lval  170/s       2837%        554%          5%          --    
+    -45%
substr_mod   310/s       5257%       1094%         92%         82%    
+      --

[14:29:08.20] c:\test>\Perl510\bin\perl5.10.0.exe 688308.pl -loops=1e3
              Rate     reverse substr_copy       subst substr_lval  su
+bstr_mod
reverse     5.91/s          --        -77%        -97%        -97%    
+    -98%
substr_copy 26.1/s        342%          --        -85%        -85%    
+    -92%
subst        169/s       2767%        548%          --         -2%    
+    -47%
substr_lval  173/s       2832%        563%          2%          --    
+    -45%
substr_mod   317/s       5264%       1113%         87%         83%    
+      --

[14:29:44.37] c:\test>\Perl510\bin\perl5.10.0.exe 688308.pl -loops=1e4
              Rate     reverse substr_copy       subst substr_lval  su
+bstr_mod
reverse     5.85/s          --        -78%        -96%        -97%    
+    -98%
substr_copy 26.2/s        349%          --        -84%        -85%    
+    -92%
subst        167/s       2750%        535%          --         -5%    
+    -48%
substr_lval  176/s       2908%        570%          6%          --    
+    -45%
substr_mod   321/s       5383%       1121%         92%         82%    
+      --

[14:30:21.73] c:\test>\Perl510\bin\perl5.10.0.exe 688308.pl -loops=1e5
              Rate     reverse substr_copy       subst substr_lval  su
+bstr_mod
reverse     5.85/s          --        -78%        -97%        -97%    
+    -98%
substr_copy 26.3/s        350%          --        -84%        -85%    
+    -91%
subst        168/s       2767%        536%          --         -3%    
+    -46%
substr_lval  173/s       2862%        558%          3%          --    
+    -44%
substr_mod   309/s       5191%       1075%         85%         79%    
+      --
[download]

Whether my benchmark is any more or less accurate than others is irrelevant to this issue, as it is the consistent placing of lvalue-substr as slowest and 4-arg substr quickest, when what they are doing should be exactly the same. The results (with a minor discrepancy that I've raised as a bug in the past, but had dismissed), is the same:

>\Perl510\bin\perl5.10.0.exe -MDevel::Peek -wle"
    $x = 'fred'; Dump( $x ); 
    print substr( $x, 0, 1 )=''; Dump( $x )"

SV = PV(0x226f64) at 0x182a19c
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x18305ec "fred"\0
  CUR = 4
  LEN = 8

SV = PVIV(0x228f40) at 0x182a19c
  REFCNT = 2
  FLAGS = (POK,OOK,pPOK)
  IV = 1  (OFFSET)
  PV = 0x18305ed ( "f" . ) "red"\0
  CUR = 3
  LEN = 7

>\Perl510\bin\perl5.10.0.exe -MDevel::Peek -wle" 
    $x = 'fred'; Dump( $x ); 
    print substr( $x, 0, 1, ''); Dump( $x )"

SV = PV(0x226f64) at 0x182a19c
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x18305ec "fred"\0
  CUR = 4
  LEN = 8
f
SV = PVIV(0x228f40) at 0x182a19c
  REFCNT = 1
  FLAGS = (POK,OOK,pPOK)
  IV = 1  (OFFSET)
  PV = 0x18305ed ( "f" . ) "red"\0
  CUR = 3
  LEN = 7
[download]

The sharp-eyed among you will notice the effect of the 'bug' I mentioned above, in that the residual result of the two expressions (which ought to be the same), differ. With the 4-arg variant leaving behind that part of the string that has been extracted (replaced), which is consistent with the documentation: "Extracts a substring out of EXPR and returns it.". Whereas the lvalue variant returns '' (null string).

But that aside, the resultant state of the affected variable ($x) is identical. I contend that the two forms should be syntactic variations only producing identical semantic results, with identical performance.

Update: The revised benchmark shows the next paragraph to be the result of a bad benchmarking. But the performance differential between lvalue-substr and 4-arg substr remains

I further contend that the performance differential between the two forms constitutes a bug! That the single opcode for the lvalue variant, takes longer to execute than the 3 opcodes of the reverse-chop-reverse mechanism, regardless of the length of the string, is to my mind, strongly indicative of a major flaw in the implementation of what should be an identical operation to the 4-arg substr variant.

I've taken a look inside the source code for pp_substr, but frankly, I do not understand the macro-machinations that go on in there.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Bug or WAD in lvalue substr? (again.) Select or Download Code

Replies are listed 'Best First'.
Re: Bug or WAD in lvalue substr? (against) by tye (Sage) on May 27, 2008 at 14:51 UTC
lvalue substr has to construct a magic scalar and then pipe the assignment to that scalar over to the original string and then destroy the magic scalar. Of course that takes longer than just adjusting a few fields in the original string's SV struct. If you don't like only being able to use lvalue substr 1.4 million times per second, then just use 4-argument substr instead. Yes, somebody could spend a bunch of time trying to hack in an optimization such that if lvalue substr is used in the simplest way, code that is written like `substr($str,$off,$len)= $sub;` actually gets compiled like `substr($str,$off,$len,$sub);` (update: note the semicolon, ikegami). But, given that the resources for such work are clearly finite (and you seem unable to be such a resource and I've become unwilling), I almost find it hard to think of something I'd put as a lower priority than such a hack. Having a simple, efficient way of changing substrings (4-arg substr) and also having a more-complicated-to-implement and slightly (2-fold) slower alternative that allows more complex use cases (`substr($str,$off,$len) =~ s/.../.../g;`) is a lovely feature, not something even close to being a bug. - tye	[reply] [d/l] [select]
Re^2: Bug or WAD in lvalue substr? (against) by ikegami (Patriarch) on May 27, 2008 at 20:32 UTC
Gotta be careful not to break `>perl -le"$x='abc'; (substr($x,0,1) = 'd') =~ s/(.)/uc $1/e; print $x" Dbc` [download]	[reply] [d/l]
Re: Bug or WAD in lvalue substr? (again.) by moritz (Cardinal) on May 27, 2008 at 13:37 UTC
A bug is every user-visible behaviour that is contrary to the documentation. I'd not call the output of Devel::Peek user-visible (it's a Devel:: module, after all), and the the docs generally don't give performance guarantees. That said you should still send a bug report to p5p, maybe they can do something about it (update: after reading tye's reply I think a doc patch would be appropriate). I've taken a look inside the source code for pp_substr, but frankly, I do not understand the macro-machinations that go on in there. I shared that experience the other day :/	[reply]
Re^2: Bug or WAD in lvalue substr? (again.) by Anonymous Monk on May 27, 2008 at 16:03 UTC
"A bug is every user-visible behaviour that is contrary to the documentation." So, "no docs, no bugs" ? I'll remember that for next time I need a good excuse for not having written documentation ;)	[reply]
Re^3: Bug or WAD in lvalue substr? (What is a bug anyway?) by moritz (Cardinal) on May 27, 2008 at 16:18 UTC
So, "no docs, no bugs" ? Yes. And no. If you try to define rigorously what a bug is, you don't really have a choice. However normally the documentation isn't really rigorous either. The manpage for grep say it searches for matches of regular expressions - but it doesn't say that it terminates after all input is exhausted. Normally you'd still say it's a bug if it hangs afterwards. But you can't just say that every program has to terminate, because stuff like servers often has to run until stopped by external intervention. So strictly speaken "no docs, no bugs" holds true. But you can still disappoint the user, and in some cases people will call that a "bug" as well. I'll remember that for next time I need a good excuse for not having written documentation ;) When you write software for money, you usually have some kind of requirement docs, which serve as docs as well. If not, you're lucky. (I saw that smiley, yes, but I still wanted to express my thoughts about what a bug is and what not).	[reply]
Re: Bug or WAD in lvalue substr? (again.) by ikegami (Patriarch) on May 27, 2008 at 19:39 UTC
Whereas the lvalue variant returns '' (null string). No. You're looking at the return value of the scalar assignment, not the return value of the lvalue substring. The lvalue substring returns the substring to be replaced as if it wasn't lvalue, but with some magic added. You seem to think `substr($x,0,1) = ''` is one op, but it's two. `>perl -MO=Concise -e"substr($x,0,1)=''" 9 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v ->3 8 <2> sassign vKS/2 ->9 <--- 3 <$> const[PV ""] s ->4 7 <@> substr[t3] sKRM/3 ->8 <--- - <0> ex-pushmark s ->4 - <1> ex-rv2sv sKRM/1 ->5 4 <#> gvsv[*x] s ->5 5 <$> const[IV 0] s ->6 6 <$> const[IV 1] s ->7 -e syntax OK` [download] This allows more complicated expressions like `substr($x,0,1) =~ s/(.)/uc $1/e;` [download] So no, there's no bug.	[reply] [d/l] [select]
Re^2: Bug or WAD in lvalue substr? (again.) by BrowserUk (Patriarch) on May 27, 2008 at 20:24 UTC
Indeed, that nails it. Once a better benchmark is used, so that the lvalue variant no longer shows as slower than both the reverse-chop-reverse and regex versions, then the much smaller difference between lvalue and 4-arg makes much more sense, in the light of this interpretation of the opcode tree. Thanks. It's always good to air these things here rather than bothering p5p with them. And who knows, maybe the simple case optimisation will turn up at some point in the future. The map in a void context probably wasn't a item on anyones high priority list either, but someone found the problem interesting enough to pursue it. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re: Bug or WAD in lvalue substr? (again.) by kyle (Abbot) on May 27, 2008 at 15:23 UTC
I'm not sure what code you're using for this. The node you reference doesn't do what your node seems to be doing. Anyway, I wrote this to verify the behavior you describe: Read more... (653 Bytes) `Rate substr_lvalue substr_mod substr_lvalue 2.58/s -- -50% substr_mod 5.16/s 101% --` [download] I don't have much to add beyond that. My first thought about this was that the lvalue case must allow something that the four argument case can't allow, but I hadn't figured out what that was when I read the reply from tye that laid it out.	[reply] [d/l] [select]

Back to Meditations