Efficient array element deletion

kennethk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Efficient array element deletion by ikegami (Patriarch) on Dec 04, 2008 at 23:10 UTC
Every time `push` is forced to allocate more memory, it needs to copy the entire array. This can be avoided by preallocating enough memory. `my $count = @array; $#array = $count*2 - 1; for (1 .. $count) { push @array, $value if ($value = shift @array) !~ /^\#/ }` [download] In terms of scalability, The grep solution you provided uses O(N) time and O(N) memory. The splice solution you provided uses O(N²) time and O(1) memory. The shift-push solution you provided uses O(N²) time and O(N) memory. The shift-push solution I provided uses O(N) time and O(N) memory.	[reply] [d/l] [select]
Re^2: Efficient array element deletion by kennethk (Abbot) on Dec 04, 2008 at 23:39 UTC
From Shift, Pop, Unshift and Push with Impunity!: One consequence of perl's list implementation is that queues implemented using perl lists end up "creeping forward" through the preallocated array space leading to reallocations even though the queue itself may never contain many elements. In comparison, a stack implemented with a perl list will only require reallocations as the list grows larger. However, perl is smartly coded because the use of lists as queues was anticipated. Consequently, these queue-type reallocations have a negligible impact on performance. In benchmarked tests, queue access of a list (using repeated push/shift operations) is nearly as fast as stack access to a list (using repeated push/pop operations). I read this to mean that while naive implementation would have yielded O(N²), perl is smart enough that the exponent drops (closer) to O(N). Is this incorrect? Also, it seems like O(N²) on splice is a worst case, where best case (either all or no deletions) would be O(N), leading me to think it'd be closer to O(N log N) in practice. The crux of my question though was supposed to be about the constant in front of the memory term, particularly as all scale equivalently in memory.	[reply]
Re^3: Efficient array element deletion by ikegami (Patriarch) on Dec 05, 2008 at 02:22 UTC
Also, it seems like O(N²) on splice is a worst case, where best case (either all or no deletions) would be O(N), leading me to think it'd be closer to O(N log N) in practice. I tried all N=16 inputs: 0 elements were shifted 1 times 16 elements were shifted 16 times 31 elements were shifted 120 times 45 elements were shifted 560 times 58 elements were shifted 1820 times 70 elements were shifted 4368 times 81 elements were shifted 8008 times 91 elements were shifted 11440 times 100 elements were shifted 12870 times 108 elements were shifted 11440 times 115 elements were shifted 8008 times 121 elements were shifted 4368 times 126 elements were shifted 1820 times 130 elements were shifted 560 times 133 elements were shifted 120 times 135 elements were shifted 16 times 136 elements were shifted 1 times 98 elements where shifted on average [download] The average result is 98, which is about twice O(N log N). So, Average case = O({loop body cost}N + {element shift cost}N log N) = O(N + N log N) = O(N log N) The thing is, the worst case is also in the same order, so Worse case = O(N log N) I accept your better average case, and I propose a better worst case than we both thought. I read this to mean that while naive implementation would have yielded O(N2), perl is smart enough that the exponent drops (closer) to O(N). Is this incorrect? A naïve implementation of `push` would take O(N) for every element pushed. Currently, it takes O(1) for most pushes, and O(N) on occasion. `@a = qw( a b c ); +---+---+---+---+ \| a \| b \| c \| / \| / = allocated, but unused. +---+---+---+---+ push @a, 'd'; +---+---+---+---+ \| a \| b \| c \| d \| +---+---+---+---+ push @a, 'e'; +---+---+---+---+---+---+---+---+---+---+---+---+ \| a \| b \| c \| d \| e \| / \| / \| / \| / \| / \| / \| / \| +---+---+---+---+---+---+---+---+---+---+---+---+` [download] It only preallocates so much. As soon as the preallocated memory is used up, a new memory block is alocated. the whole array must be copied. The shift-push solution is therefore O(N * N{chance of reallocation needed}) which probably ressembles worse/average case O(N log N). So I that makes the scalability as follows: The grep solution you provided uses O(N) time and O(N) memory. The splice solution you provided uses O(N log N) time and O(1) memory. The shift-push solution you provided uses O(N log N) time and O(N) memory. The shift-push solution I provided uses O(N) time and O(N) memory. The crux of my question though was supposed to be about the constant in front of the memory term, particularly as all scale equivalently in memory.* I thought you were more interested in speed, sorry. splice is done in-place. (Assuming you get rid of the `reverse`!!) grep probably uses N SV* extra memory. It could possibly be done in place. My shift-push uses N SV* extra memory. Your shift-push uses between N and 5N SV (peak), and between N and 3N SV (final) extra memory. Pushing slightly more than doubles the allocated memory when a reallocation is forced. If N' is the number of elements kept, 3N is 2(N+N') when N'=N, minus the initial memory. The peak occurs when copying the pointers from the old memory block to the new memory block.	[reply] [d/l] [select]
Re: Efficient array element deletion by johngg (Canon) on Dec 04, 2008 at 23:13 UTC
`for (-$#array .. 0) { ...` [download] I don't think that's going to do quite what you intended. `J:\>perl -le "@arr = ( 1 .. 10 ); print $arr[ $_ ] for - $#arr .. 0;" 2 3 4 5 6 7 8 9 10 1 J:\>` [download] Perhaps this instead. `J:\>perl -le "@arr = ( 1 .. 10 ); print $arr[ $_ ] for reverse 0 .. $# +arr;" 10 9 8 7 6 5 4 3 2 1 J:\>` [download] Cheers, JohnGG	[reply] [d/l] [select]
Re: Efficient array element deletion by jwkrahn (Abbot) on Dec 04, 2008 at 23:17 UTC
On the other extreme of the spectrum, I could say `for (-$#array .. 0) { splice (@array,$_,1) if ($array[$_] =~ /^\#/) }` [download] That doesn't do what you seem to think it does. It starts with the second element of the array, iterates up to the last element of the array and finally ends with the first element of the array. If you splice elements out of the array then the elements remaining will be moved and won't be removed by subsequent splices. What you need to do is start at the last element of the array and iterate towards the first element: `for ( reverse 0 .. $#array ) { splice @array, $_, 1 if $array[ $_ ] =~ /^\#/; }` [download]	[reply] [d/l] [select]
Re^2: Efficient array element deletion by ikegami (Patriarch) on Dec 04, 2008 at 23:55 UTC
`for ( reverse 0 .. $#array )` flattens the list. `for ( -@array .. -1 )` is better.	[reply] [d/l] [select]
Re^3: Efficient array element deletion by johngg (Canon) on Dec 05, 2008 at 12:58 UTC
`for ( reverse 0 .. $#array )` flattens the list.`for ( -@array .. -1 )` is better. Except I don't think that will work either. `J:\> perl -le "@arr = ( 1 .. 10 ); print $arr[ $_ ] for -@arr .. -1;" 1 2 3 4 5 6 7 8 9 10 J:\>` [download] From the documentation (Range Operators, my emphasis): In list context, it returns a list of values counting (up by ones) from the left value to the right value so I don't think it can be persauded to decrement. So doing `J:\> perl -le "@arr = ( 1 .. 10 ); print $arr[ $_ ] for -1 .. -@arr;" J:\>` [download] results in nothing useful. Cheers, JohnGG	[reply] [d/l] [select]
Re^4: Efficient array element deletion by ikegami (Patriarch) on Dec 05, 2008 at 13:07 UTC
Re^5: Efficient array element deletion by BrowserUk (Patriarch) on Dec 05, 2008 at 14:11 UTC
Some notes below your chosen depth have not been shown here
Re: Efficient array element deletion by ikegami (Patriarch) on Dec 04, 2008 at 22:55 UTC
If your array can't contain undefined values to begin with, the simplest approach would be to undefine the values rather than deleting them. Then, just ignore the undefined values later on.	[reply]
Re: Efficient array element deletion by fert (Acolyte) on Dec 04, 2008 at 23:12 UTC
If you aren't afraid of a few counters you could do something like this: `my $replace = 0; for ( my $x = 0; $x< @array; $x++ ) { if ( # pass of your condition ) { $array[$replace] = $array[$x]; $replace++; } }` [download] This will effectively 'shift' every thing over to the front of your array, avoiding double memory issues, and all you have to do is one final pass to "cleanup" the invalid entries at the end (pop @array until the length == $replace).	[reply] [d/l]
Re^2: Efficient array element deletion by GrandFather (Saint) on Dec 04, 2008 at 23:36 UTC
And if you aren't afraid of Perl you can: `my $replace = 0; for my $x (0 .. $#array) { next unless # pass of your condition; $array[$replace++] = $array[$x]; }` [download] or maybe even: `your condition and $array[$replace++] = $array[$_] for 0 .. $#array;` [download] Perl's payment curve coincides with its learning curve.	[reply] [d/l] [select]
Re: Efficient array element deletion by Sinister (Friar) on Dec 05, 2008 at 07:50 UTC
If I have a long array and my goal is to perform some test on each element and remove those elements that fail, what are the best ways to do it from CPU and memory standpoints? I think preventing those entries from ever making it to the array is far more efficient then pushing them on and then later ''grep-ing'' them out. Any form of array shrinkage is costly (as has been proved throughout this whole thread).	[reply]


Perl: the Markov chain saw
	PerlMonks