Also, it seems like O(N^{2}) on splice is a worst case, where best case (either all or no deletions) would be O(N), leading me to think it'd be closer to O(N log N) in practice.
I tried all N=16 inputs:
0 elements were shifted 1 times
16 elements were shifted 16 times
31 elements were shifted 120 times
45 elements were shifted 560 times
58 elements were shifted 1820 times
70 elements were shifted 4368 times
81 elements were shifted 8008 times
91 elements were shifted 11440 times
100 elements were shifted 12870 times
108 elements were shifted 11440 times
115 elements were shifted 8008 times
121 elements were shifted 4368 times
126 elements were shifted 1820 times
130 elements were shifted 560 times
133 elements were shifted 120 times
135 elements were shifted 16 times
136 elements were shifted 1 times
98 elements where shifted on average
The average result is 98, which is about twice O(N log N). So,
Average case
= O({loop body cost}*N + {element shift cost}*N log N)
= O(N + N log N)
= O(N log N)
The thing is, the worst case is also in the same order, so
Worse case
= O(N log N)
I accept your better average case, and I propose a better worst case than we both thought.
I read this to mean that while naive implementation would have yielded O(N2), perl is smart enough that the exponent drops (closer) to O(N). Is this incorrect?
A naïve implementation of push would take O(N) for every element pushed. Currently, it takes O(1) for most pushes, and O(N) on occasion.
@a = qw( a b c );
+++++
 a  b  c  /  / = allocated, but unused.
+++++
push @a, 'd';
+++++
 a  b  c  d 
+++++
push @a, 'e';
+++++++++++++
 a  b  c  d  e  /  /  /  /  /  /  / 
+++++++++++++
It only preallocates so much. As soon as the preallocated memory is used up, a new memory block is alocated.
the whole array must be copied. The shiftpush solution is therefore O(N * N*{chance of reallocation needed}) which probably ressembles worse/average case O(N log N).
So I that makes the scalability as follows:
 The grep solution you provided uses O(N) time and O(N) memory.
 The splice solution you provided uses O(N log N) time and O(1) memory.
 The shiftpush solution you provided uses O(N log N) time and O(N) memory.
 The shiftpush solution I provided uses O(N) time and O(N) memory.
The crux of my question though was supposed to be about the constant in front of the memory term, particularly as all scale equivalently in memory.
I thought you were more interested in speed, sorry.
 splice is done inplace. (Assuming you get rid of the reverse!!)
 grep probably uses N SV* extra memory. It could possibly be done in place.
 My shiftpush uses N SV* extra memory.
 Your shiftpush uses between N and 5*N SV* (peak), and between N and 3*N SV* (final) extra memory.
Pushing slightly more than doubles the allocated memory when a reallocation is forced. If N' is the number of elements kept, 3*N is 2*(N+N') when N'=N, minus the initial memory. The peak occurs when copying the pointers from the old memory block to the new memory block.
