http://www.perlmonks.org?node_id=1052800


in reply to Re^2: Data structures in Perl. A C programmer's perspective.
in thread Data structures in Perl. A C programmer's perspective.

My post was really about order of growth for various tasks given two different data structures.

An array stores its data linearly, in contiguous memory. If you want to find the "nth" element, you go straight to it (it's a simple mathematical computation, internally). But if you want to insert between "n" and "n+1" you must shift everything from n+1 through the end of the array out one spot, and hopefully not have to copy the entire array to a larger block of contiguous memory.

A linked list stores its data all over the place. Each element just points to the location of the next one. Finding the 'nth' element means starting at the top and counting until you get there. But once you're there, inserting between n and n+1 is simply a matter of changing where a couple pointers point.

So an array has O(1) lookups, and O(n) insertions and deletions. A linked list has O(n) lookups, and O(1) insertions/deletions. But big-O is concerned with orders of growth. There are many other factors that don't get considered when looking at orders of growth of computational complexity.

But I also mentioned that Perl's arrays, being implemented at the "guts" level, are very fast. And most linked list implementations in Perl are implemented in Perl code, which is comparatively slow.

Add to that the things that eyepopslikeamosquito mentioned, and it makes it very hard for linked lists to be the data structure of choice. ...even when Perl's arrays have so much behind the scenes overhead going on, they're still quite hard to beat.

That doesn't mean that we have to throw out the window everything we learned in comp sci. It just means that technology has found ways to bend the rules-of-thumb. A binary tree is still an excellent data structure for certain uses. Linked lists can still make sense for some tasks. A trie is hard to beat for certain types of lookups. I guess what we have to keep in mind is that now more than ever one must fully understand the nature of his data, the usage patterns expected, and the tricks the hardware and compilers can play.

I don't know too much about the guts-level implementation of Perl's hashes, but it's my understanding that the implementation hashes the key to derive a bucket, and that when there's a collision (more than one item in a bucket), the buckets' contents are stored in linked lists.


Dave

  • Comment on Re^3: Data structures in Perl. A C programmer's perspective.