Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Linked lists as arrays: inserting values

by radiantmatrix (Parson)
on Sep 25, 2006 at 15:19 UTC ( #574748=perlquestion: print w/ replies, xml ) Need Help??
radiantmatrix has asked for the wisdom of the Perl Monks concerning the following question:

I've been reading Advanced Perl Programming lately, and within there is an admonition that one should "almost always" use Perl's arrays instead of a linked list. The rationale is that the built-in functions (pop, push, shift, and unshift) are quite fast -- likely much faster than anything one could implement oneself in pure Perl.

I think it's valuable advice, but the set of functions the author lists deal only with the ends of arrays. One of the values, to me, of the linked list is the ease with which one can insert values in the middle of the list. The only sensible Perl-array idiom I was able to come up with for doing this is to use array slices:

sub insert_array_elem { my ($ra, $elem, $index) = @_; # insert $elem before $ra->[$index] if ($index < 0) { # convert negative indexes to positive equiv. $index = @$ra + $index; } if ($index == 0) { # at the beginning unshift @$ra, $elem; } elsif ($index == @$ra) { # at the end push @$ra, $elem; } else { # insert between -- makes copy: bad for large arrays? @$ra = @$ra[0..$index-1], $elem, @$ra[$index..@$ra-1]; } }

It seems to me that this method makes a copy of the array anytime you insert a value in the middle of the array. For small-ish arrays, that's fine, but for large arrays, wouldn't that be very slow?

Since I have an upcoming interest in inserting values in the middle of a list of values, I am curious if there is any way of inserting values in the middle of an array that's faster (and more RAM-friendly) than making a full copy for each insert. What can I do?

Update: D'oh! I had completely spaced the existence of splice. Thanks to shmem especially for the benchmark (saved me some work), but also to Fletch and Tanktalus who were first to discuss the use of splice for this purpose.

For the record, or for anyone who might search this node later:

# insert $elem before $index in @array splice(@array, $index, 0, $elem);
<radiant.matrix>
A collection of thoughts and links from the minds of geeks
The Code that can be seen is not the true Code
I haven't found a problem yet that can't be solved by a well-placed trebuchet

Comment on Linked lists as arrays: inserting values
Select or Download Code
Re: Linked lists as arrays: inserting values
by Fletch (Chancellor) on Sep 25, 2006 at 15:26 UTC

    splice can insert items in the middle, but yes that'll trigger the same inefficient copying and moving that your slice method incurs. If you're mucking with the middle of your list a lot then yes, you may have a case where a linked list will be more efficient than a native array.

Re: Linked lists as arrays: inserting values
by Tanktalus (Canon) on Sep 25, 2006 at 15:43 UTC

    Well, it seems that even you quote says "almost always" - perhaps you've found one of the places that make it "almost always" rather than just "always".

    That said, as far as I'm aware, perl data is mostly a small struct of pointers, so copying them around is probably not that expensive - O(n) based on the number of items that need to be copied around instead of O(nm) where m is related to the length of the strings, or the contents of whatever they may refer to (hash refs, array refs, objects, etc.). So it may not really be that bad to use splice.

    The flip side is that by using perl arrays for your data instead of linked lists, perl handles all the details for you. Not that linked lists are necessarily hard or anything, but any time you introduce any type of complexity, you increase the possibility for bugs. By their nature, programs are complex, so we can't avoid that risk. However, we can avoid risk in areas with insignificant gains.

    That, of course, begs us to ask: what gains? And thus, I challenge you to benchmark it to prove that there are gains to be had with another method, and to prove that those gains are of significance in your application.

    My guess is that you'll need a package full of code to abstract the list away to keep the rest of your code simple. And that will eat away at significant portions of your speed gains. And then, if you ever want to hand your list to some standard function, you're going to have to convert it back to a list anyway, and there goes all the rest of your gains.

    That's just a guess, though. ;->

      Absolutely the correct answer.

      Building a large data set by repeatedly splicing into the middle is indeed O(n*n) while a linked list is O(n). But that is O(n*n) with a small constant term versus O(n) with a big term. Unless your dataset is very large, the native array approach will be far faster. Just consider the cost of accessing the next element. With the native approach it will be a pointer lookup versus having to make a function call (and Perl function calls are slow).

      Furthermore a final reason not to use linked lists in Perl. Unless you are very careful, the linked lists will have circular data structures (each item points to the next which points to the previous). Therefore you are either in the business of having to do memory management yourself, or else you need to add yet another layer of slow indirection. Either way you've added more complexity, more room for bugs, and have reduced your potential performance gains even more.

Re: Linked lists as arrays: inserting values
by DentArthurDent (Monk) on Sep 25, 2006 at 15:49 UTC
    If the insert appears to be greater than O(n log n) then perhaps keeping a hash of the data and generating the order with a sort of the keys list might be faster...

    Just a thought..
    ----
    My mission: To boldy split infinitives that have never been split before!
Re: Linked lists as arrays: inserting values
by holli (Monsignor) on Sep 25, 2006 at 15:54 UTC
    I was pretty sure there was something for this in List::MoreUtils, but as it turned out there isn't. So I came up with this:
    sub insert_array_elem { my ($ra, $elem, $index) = @_; # insert $elem before $ra->[$index] my $idx = 0; insert_after { $idx++ == $index-1; } $elem => @{$ra}; }
    This doesn't work for the edge cases (first and last element), but hey, that's what pop and friends are there for. Maybe you could write an email to the author of List::MoreUtils to provide an insert_at_index function?.


    holli, /regexed monk/
        Yup. I benchmarked my solution against the other two and it is way slower. But: It looks best ;-)


        holli, /regexed monk/
Re: Linked lists as arrays: inserting values
by VSarkiss (Monsignor) on Sep 25, 2006 at 16:08 UTC
Re: Linked lists as arrays: inserting values
by shmem (Canon) on Sep 25, 2006 at 16:27 UTC
    Benchmarking your code (insert_array_elem1)against the same code using splice (insert_array_elem2), changing the line
    @$ra = @$ra[0..$index-1], $elem, @$ra[$index..@$ra-1];

    with

    splice(@$ra,$index,0,$elem);

    and using

    cmpthese (5000, { radiantmatrix => sub { my $array = [1..1000]; insert_array_elem1($ +array,1,$_) for 50..100 }, use_splice => sub { my $array = [1..1000]; insert_array_elem2($arr +ay,1,$_) for 50..100 }, });

    results in

    Rate radiantmatrix use_splice radiantmatrix 467/s -- -83% use_splice 2762/s 492% --

    radiantmatrix - use_splice :-)

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

      Update: Ignore this post (mistakes pointed out by shmem and ikegami below.

      Perhaps I'm doing something wrong, but my benchmarking results are radically different from shmem's.

      For a start, with the same data (programme run several times), I get something like this:

      Rate splicing radiant splicing 2371/s -- -19% radiant 2936/s 24% --

      This is not the first time that I have got very different benchmarking results than other Monks on this forum, but this time the difference is particularly egregious.

      In case you're wondering:

      C:\Perl\progs>perl -v This is perl, v5.8.8 built for MSWin32-x86-multi-thread <snip> Binary build 817 provided by ActiveState

      And the bigger the original array gets (and the greater the number of elements to insert), the more radiantmatrix's code appears to outperform splice:

      C:\Perl\progs>scratchpad.pl 6000 7000 10000 Array size: 10000 Inserting: 6000 .. 7000 Rate splicing radiant splicing 28.1/s -- -86% radiant 198/s 603% --
      C:\Perl\progs>scratchpad.pl 60000 61000 100000 Array size: 100000 Inserting: 60000 .. 61000 Rate splicing radiant splicing 2.87/s -- -93% radiant 42.7/s 1388% --

      Perhaps I've got something very very wrong, but my findings seem to be borne out by this extract from Mastering algorithms with Perl, Chapter 3:

      ...splicing elements into or out of the middle of a large array can be very expensive.

      Here's my benchmarking code, demolish it at will:

        Running your code on my Linux box, I get:
        qwurx [shmem] ~> perl 574821.pl 6000 7000 10000 Array size: 10000 Inserting: 6000 .. 7000 Rate splicing radiant splicing 46.8/s -- -66% radiant 137/s 193% --

        You are inserting a number between 6000 and 7000 at index 1, in every call to insert1 and insert2.

        # called as insert1( \@ary, 1, $_ ) for $START .. $END sub insert1 { my ( $ra, $index, $elem ) = @_; @$ra = @$ra[0 .. $index-1], $elem, @$ra[$index .. @$ra-1]; } # I tested with sub insert1 { my ( $ra, $elem, $index ) = @_; @$ra = @$ra[0 .. $index-1], $elem, @$ra[$index .. @$ra-1]; } # called as insert1( \@ary, 1, $_ ) for $START .. $END

        If I swap $elem and $index (i.e. insert 1 at an index from $START to $END) I get:

        qwurx [shmem] ~> perl 574821.pl 600 700 1000 Array size: 1000 Inserting: 600 .. 700 Rate radiant splicing radiant 30.5/s -- -98% splicing 1672/s 5377% --

        The insert somewhere in the middle is more expensive, that's why $_/10 for @params here.

        My perl:

        $ perl -v This is perl, v5.8.8 built for i586-linux-thread-multi ...

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

        Here's my benchmarking code, demolish it at will:

        First rule of Benchmarking, make sure the code you are benchmarking actually works!

        @$ra = @$ra[0 ..$ index-1], $elem, @$ra[$index .. @$ra-1];
        means
        (@$ra = @$ra[0 ..$ index-1]), $elem, @$ra[$index .. @$ra-1];
        You want
        @$ra = ( @$ra[0 ..$ index-1], $elem, @$ra[$index .. @$ra-1] );

        Also, your arguments are backwards:
        insert1( \@ary, 1, $_ ) for $START .. $END
        insert2( \@ary, 1 ,$_ ) for $START .. $END
        should be
        insert1( \@ary, $_, 1 ) for $START .. $END
        insert2( \@ary ,$_, 1 ) for $START .. $END

        Once fixed (and setting the loop count to -3 cause it was taking forever):

        >perl 574845.pl 500 600 1000 Array size: 1000 Inserting: 500 .. 600 Rate radiant splicing radiant 17.3/s -- -99% splicing 1470/s 8392% --
Re: Linked lists as arrays: inserting values
by GrandFather (Cardinal) on Sep 25, 2006 at 18:51 UTC
Re: Linked lists as arrays: inserting values
by jdporter (Canon) on Sep 25, 2006 at 21:50 UTC

    Does your application actually require random-access insertion at any point in the array? If not, there are some optimization techniques you can try...

    We're building the house of the future together.

        I should have said heuristics, rather than optimizations. For example, if the point of the next insertion is usually "very close" to the previous insertion, you can make a significant improvement in the performance (e.g. O(n) vs O(n log n)). At any rate, choosing a O(n) algorithm up front instead of a O(n log n) algorithm shouldn't be dismissed as "premature". It could be, in fact, a well-timed optimization of your development process. :-)

        We're building the house of the future together.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://574748]
Approved by prasadbabu
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2014-07-30 04:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls