Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

What makes an array sorted and a hash unsorted?

by ikegami (Patriarch)
on Jun 01, 2009 at 16:34 UTC ( [id://767315]=perlmeditation: print w/replies, xml ) Need Help??

It always bothers me when someone says hash are unsorted.

An array is sorted because it can return its values in order of ascending keys (indexes). But a hash can do the same thing. In fact, hashes can do everything an array can do with no extra state information. (It's just more expensive.)

That means that hashes are just as sorted as arrays in theory.

The difference is how they are used in practice. Array keys (indexes) only convey order, while hash keys are part of the true value of the hash element. Therefore, array values can be reordered, but hash values can't be reordered.

That hashes are unsorted isn't a property of hashes so much as of the way we use them.

</ramblings>

  • Comment on What makes an array sorted and a hash unsorted?

Replies are listed 'Best First'.
Re: What makes an array sorted and a hash unsorted?
by gwadej (Chaplain) on Jun 01, 2009 at 20:39 UTC

    I would not use the term sorted, I would use the term ordered.

    This distinction is perhaps more obvious when you look at another way to map a string to a value. The two approaches I'm familiar with are binary trees and hash tables.

    In the case of a binary tree, the items are ordered because the structure of the tree imposes an intrinsic order on the data. Hashes, on the other hand, are intrinsically not ordered internally. In fact, this feature is what allows for (relatively) inexpensive insertion and deletion. (Two items with keys very close to each other are likely to end up in different areas of the hash table.)

    As you said, the interface we see for the hash could be changed to return the data in an ordered fashion, but the data structure is still intrinsically unordered.

    G. Wade

      Having probably instigated this with my use of the term today :), I was browsing over wikipedia entries while eating lunch and trying to come up with my take. I think that you're getting at the distinction here: Perl has associative arrays (an abstract datatype) which are implemented on top of hash tables (a more concrete datatype; which of course explains the colloquial usage of calling them "hashes"). However neither of these two guarantees nor provides any intrinsic ordering of keys. Were the associative arrays implemented on top of something else instead (say some sort of binary search tree, or an assoc list with new items added in sorted order rather than just prepended) the underlying implementation would then provide an intrinsic ordering of keys and then Perl's associative arrays could be said to be ordered (and probably wouldn't be called hashes . . . :).

      So yeah, "unordered" is probably a better description than "unsorted".

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

      Hashes, on the other hand, are intrinsically not ordered internally. In fact, this feature is what allows for (relatively) inexpensive insertion and deletion. (Two items with keys very close to each other are likely to end up in different areas of the hash table.)
      That's not correct. Hash keys are ordered, otherwise you wouldn't be able to quickly search for them (and hence, no quick deletion either). However, the order internally isn't the same order people want (this thread seems to consider "ordered hashes" as "keys are sorted" - it's not uncommon for people to mean "keys returned in insertion order" when they talk about "ordered hashes"). I've always understood that keys (and its friends) return the keys in the way they are laid out internally - so certainly "ordered", but only after applying a hash function.

        You are correct. I was sloppy with the term keys.

        It would be more accurate to say that the data is not ordered with respect to the original keys. However, it is ordered with respect to the result of the hash function applied to the keys.

        Given that a good hash function should distribute the data fairly evenly through the table (and that the hash function is often not known by the user), this is effectively the same as saying that the data is not ordered with respect to the original keys.

        G. Wade
        Howdy!

        Hash keys have no predefined ordering. Any given call to keys() will present the keys as a list which will, at that instant, have an order, but that particular ordering is neither persistent nor stable nor predictable in the face of insertions and deletions. Rapid searching for hash keys has nothing to do with ordering, but rather with the nature of hash functions. Ideally, any given key value hashes to a unique value in the hash that can be found (or the existence of tested for) in constant time. Searching an ordered list is going to be O(log n) or worse.

        yours,
        Michael
Re: What makes an array sorted and a hash unsorted?
by DStaal (Chaplain) on Jun 01, 2009 at 17:01 UTC

    I think that usually when someone says 'sorted' in this context it means 'has a defined intrinsic order'. Which arrays have and hashes do not.

    The difference is in this code:

    print $_ foreach @array; print $_ foreach keys %hash;

    The array will return you the same list every time, given the same inputs. The hash's list will change with every new run. In fact, if you were to run line 2 again as line 3, perl would be within it's rights to return a different ordering.

    Another way to look at is that an array keeps it's sort order, where a hash doesn't. So a hash is 'unordered' by default: If you want a specific order, you have to ask for it.

    It's not that you can't do anything you want with either one (after all, if you only wanted one way to do something, Perl probably isn't the language for you), but that each is best for different things, and this is a shorthand for one of the main differences: If maintaining the order matters, especially if ordering is expensive, an array has an advantage. If order doesn't matter, a hash's other skills may be more useful.

      The difference is in this code:

      print $_ foreach @array; print $_ foreach keys %hash

      So if keys sorted the keys before returning them, the hash would be considered sorted? nah.

      Another way to look at is that an array keeps it's sort order, where a hash doesn't.

      keys may not return the keys in order, but it could. "f" is not suddenly less than "b" because keys decides to return foo before bar.

      Again, it can't be that hashes aren't ordered simply because Perl doesn't provide a builtin function to return them sorted like it does for arrays.

      The hash's list will change with every new run.

      Nit:

      s/will/may/

      Perl can and will return them in the same order in some circumstances, even after the 5.8.1 change to add randomisation.

      $ perl -e'system(perl => "-le %h = map { \$_=>1 } qw(a b c d); print k +eys %h") for 1..10' cabd cabd cabd cabd cabd cabd cabd cabd cabd cabd

        The point isn't that there is a way to sort them. It's whether you need to use it to guarantee the order you get: With an array you don't, with a hash you do.

        And yes, may. But the point is on the array the order will not. You can be sure of that: That given the same input, you will get the same output. A hash intentionally does not guarantee that.

        And no, sorting the keys does not make the hash sorted. It simply means you have a sorted array that can be used to access the hash in a sorted order.

        ... even after the 5.8.1 change to add randomisation.

        So, while not yet perfect, the tendency is to randomize the order in which keys or values are returned. keys and values are meant to return their values in no predictable order.

        You say,

        keys may not return the keys in order, but it could.

        Yes, e.g. tying the hash to DB_File, as DB_BTREE. But what's your point? In theory, it could, but in practice, it doesn't. In which theoretical circumstances that would be meaningful in practice, would keys return the keys of a perl untied hash in any meaningful order (i.e. shortest..longest, lexically sorted, lifo/fifo or such) ? And why should it?

        will vs may:

        I think newer versions of perl WILL, intentionally, as a security feature.

Re: What makes an array sorted and a hash unsorted?
by Roy Johnson (Monsignor) on Jun 01, 2009 at 17:20 UTC
    Hash values are often reordered when more buckets are allocated and so forth, right?

    I would say the thing that makes arrays sorted is that you can say @arrayname to get the values, and they'll be in a consistent order. If you say values %hash to get the values of a hash, they will be in no particular order.

    Arrays have push and pop and shift and unshift that take advantage of the order of the array. for iterates through arrays in known order; each iterates through hashes. I think not merely our usage, but the language's design, says that arrays are ordered and hashes are unordered.


    Caution: Contents may have been coded under pressure.

      If you say values %hash to get the values of a hash, they will be in no particular order.

      So if the following change was built into Perl, hashes would be considered sorted?

      use subs qw( values ); sub values(\%) { my $h = shift; return keys(%$h) if !wantarray(); my @keys = sort keys(%$h); return @{$h}{@keys}; }
      my %h = map { $_ => $_ } 'a'..'z'; print(values(%h), "\n"); # abcdefghijklmnopqrstuvwxyz

      I think there's more to it than just this change in the builtin accessors.

        Yes, they'd be considered sorted. However, it would be of limited use, which is probably why it isn't built that way. Arrays are designed for things that are sorted. Hashes are designed for lookup by key, with ordering left up to the users as a separate step.

        When people think they have a hash in a particular order, it isn't generally ASCIIbetical order, but the order they put the elements into it. Obviously, that's a misconception. Hashes don't preserve order. Arrays, on the other hand, do. Wherever you put an element, it has a position relative to other elements of the array. That's intrinsic to the array.

        All the iterators built into the language will walk the array in the same order. That is not the case for hash iterators. The language could have been written to iterate over arrays in random order, or hashes in predefined order, but it was not.


        Caution: Contents may have been coded under pressure.
Re: What makes an array sorted and a hash unsorted?
by shmem (Chancellor) on Jun 02, 2009 at 09:44 UTC

    What is 'sorted'? (unsorted) data is sorted after sorting.

    An array is sorted because it can return its values in order of ascending keys (indexes).

    No. An array is sorted when their values are sorted, by whatever order you impose on them. I consider the following

    @ary = (34, 'a', '+', \0, 'y','Z');

    as an unsorted array. Iterating over an array via ascending indexes or, in a destructive way, via shift or pop is the natural way to access all values sequentially, just like each does for hashes. Arrays are a defined sequence of scalars on which you can impose any order in a stable way, e.g. by sorting the values

    @ary = sort { $b cmp $a } @ary;

    meaning that after that operation the sequential iteration over the array whith ascending or descending keys reflects the order which you impose on the data.

    That's not possible with hashes. You can't impose a different order of key/value tupels on a hash to change the order in which each returns those tupels. That's meant by 'hashes are not sortable'. If you fill a hash sequentially which tupels, retrieveing those via each doesn't necessarily reflect the initial filling sequence. That's meant by 'hashes are unordered' (or unsorted).

    @h{(a..z)} = (A..Z); %h = map { $_ => $h{$_} } sort { $b cmp $a } keys %h;

    This may change the order in which each returns tupels iterating over the hash, but not in a predictable way, and certainly their sequence doesn't reflect the sorting.

    But you can do that with arrays. That's what makes arrays sorted (if you sort them) but hashes unsorted: you can't impose a different stable order in which tupels are returned.

    Sorting applies to a sequence. If we look at hashes as a collection of tupels, we note that we can't influence the ordering of tupels retrieving those as a sequence. But that's due to the fact that arrays are sequences, and hashes are not: "associative array" for perl hashes is a misnomer ;-)

      In the first half of your post, you talk about array values, which you proceed to compare to hash tuples rather than hash values.

      That supports what I said. Hashes are different than arrays because people consider hash keys to be part of the value. If that wasn't the case, there would be no functional difference between arrays and hashes.

Re: What makes an array sorted and a hash unsorted?
by wol (Hermit) on Jun 03, 2009 at 12:17 UTC
    It always bothers me when someone says hash are unsorted.
    Strange - it doesn't bother me in the slightest.

    --
    use JAPH;
    print JAPH::asString();

Re: What makes an array sorted and a hash unsorted?
by ig (Vicar) on Jun 07, 2009 at 00:51 UTC

    I find it interesting that after so many decades with concepts of object oriented programming and encapsulation being current that there is not a clearer distinction between the behaviors of the interfaces to arrays and hashes and the internal implementation details.

    So if keys sorted the keys before returning them, the hash would be considered sorted? nah.

    Absolutely! For most users of perl a hash is (and should be) characterized by the behaviours of its interfaces. If the usual interfaces provide the contents in sorted order, then the hash is sorted, from a user perspective.

    The common interfaces for accessing a hash collectively (i.e. not lookup of individual values) are: flattening to a list and the keys, values and each functions. If these returned the contents in sorted order (for some definition of sorted order) then most users of perl would consider the hash to be sorted. Whether, internally, it was implemented as an array with linear lookup, hashed bucket lists, B-tree, a convoluted, permutated, obfuscated collection of bits or whatever, would be, as it should be, irrelevant.

    As it is, the usual interfaces return the contents in an order that is difficult for most users to predict and understand and which is quite variable as the contents changes. This unpredictability of the order of results (and from a naive user perspective they are very unpredictable, despite the generally deterministic nature of most modern computers, and more so since the randomization was introduced to enhance security) leads directly to the conclusion that hashes are unsorted. And this conclusion is strongly reinforced by the common use of sort, as in sort keys %hash. The sort function is not particular to hashes and is rightly perceived as something done to the unsorted list of hash keys returned by keys, not an integral part of the hash interface.

    Of course, for the implementers of perl, the internals are very important. But for users they are only indirectly important to the extent that they affect the performance with which the provided interfaces deliver their results.

    What makes an array sorted and a hash unsorted?

    The user interfaces.

Re: What makes an array sorted and a hash unsorted?
by ig (Vicar) on Jun 07, 2009 at 02:38 UTC
    It always bothers me when someone says hash are unsorted.

    I have more difficulty with the idea that an array is sorted than with the idea that a hash is unsorted.

    Thinking only of the data structures and not the interfaces or implementation, I think of an array as an ordered list. Because it is ordered it is possible, though not necessary, for it to be sorted. On the other hand, I think of a hash as an unordered set of key/value pairs. with the constraint that there can only be one of each key in the set. As there is no order, there is no possibility of there being a sorted order.

    A hash could be defined as an ordered list of key/value pairs. If it were ordered, then it could be sorted. But very little in the description or interfaces suggests to me that it is ordered and what I know of the implementation is that, to the extent that it is ordered, the order of the elements is randomized (update randomized isn't the right word at all. The distribution is highly ordered in the sense that there should be uniform distribution across the buckets, and the determination is repeatable so that elements can be found again later, so there is nothing unpredictable (i.e. random) about it. Maybe chaotic, in the sense of deterministic chaos, would be a better term - implying that the resulting order is very sensitive to small variations in the initial key values and not easily predicted except by full execution of the hashing algorithm.).

    So, arrays are inherently ordered while hashes are either unordered or, at best, ordered but with an inherently random chaotic order. Therefore, arrays are sortable and hashes are inimical to sorting.

      A hash could be defined as an ordered list of key/value pairs.

      I'll amplify your objections to your own offered premise. Since inserting a new key into a hash can significantly change the "order" of the previously inserted keys, I disagree that a Perl hash can be reasonably characterized as "an ordered list of key/value pairs". Certainly, at any given moment, the key/value pairs in an ordinary Perl hash have a defined order. And that particular order is quite useless, incidental, out of our control, and even inconstant. An "ordered" data structure mustn't shuffle its contents simply because a new item is being inserted. It is impossible to store items in computer memory and not have those items in some "order". That doesn't make every data structure into an "ordered data structure".

      Since it is possible for %one= %two to leave %one with its keys in a different order than %two's, a Perl hash isn't "an ordered list of key/value pairs". Copying a hash doesn't (always) copy the order because the order is only an incidental part of the data structure.

      - tye        

Re: What makes an array sorted and a hash unsorted?
by jethro (Monsignor) on Jun 04, 2009 at 12:14 UTC

    That means that hashes are just as sorted as arrays in theory

    In theory (i.e. computer science theory) there are no arrays in perl
    > perl -we '$a[4]=1; print shift @a; ' Use of uninitialized value within @a in print at -e line 1.

    (ok, we could define UNDEF as a defined/actual/existing value of an array, but then why does scalar(@a) report 5 when there is a value in $a[37865]? )

    In theory perls closest thing to arrays is an array, since for example $value==pop(push @a,$value)) (a fundamental property of theoretical arrays) while pop and push for hashes isn't even defined (even though someone could define it somehow in perl5.11 just to make a point ;-)

    In practice perls data structures are as shifty as the language itself and you can implement hashes in arrays as well as arrays in hashes. So the sentence "arrays are ordered and hashes are not" (I hope we all concur by now that 'sorted' was the wrong word) is merely a short form for "arrays provide built-in infrastructure to manipulate ordered values in an efficient manner, while hashes do not"

    After all, when we try to dissuade others from using hashes in some situations, it is not because it is impossible with hashes, but inefficient or awkward. So we can't compare arrays and hashes without looking at the costs to do operations or their built-in access methods. Because without that, i.e. "in theory" perl hashes and arrays are equivalent/just storage/interchangeable

    UPDATE: changed a really idiotic length(@a) to scalar(@a).

      Howdy!

      In practice perls data structures are as shifty as the language itself and you can implement hashes in arrays as well as arrays in hashes. So the sentence "arrays are ordered and hashes are not" (I hope we all concur by now that 'sorted' was the wrong word) is merely a short form for "arrays provide built-in infrastructure to manipulate ordered values in an efficient manner, while hashes do not"

      Hashes and arrays are reasonably well defined fundamental data types in Perl. I'd say "clearly defined", but this thread casts doubt on that assertion. Only at a conceptual level can you really assert that equivalency in the second sentence. Arrays and hashes are very strong data types in Perl. You cannot cast one into the other. You can use certain operations on each to produce instances of the other (conflating lists with arrays here). Those operators are not type casters, however.

      In your code example, you first put a value in to the array at index 4. This causes indexes 0 through 3 to spring into existence, but, naturally, they have no defined value. When you then shift off the first value to print it, you get index 0 (the "first" value) and, being undefined, triggers the warning that you asked to have happen. It is unclear what that was meant to demonstrate.

      Later you speak of push/pop and how they are not defined for hashes. Of course. Push and pop (and shift and unshift) are all curried versions of splice, which is the general purpose function for mucking about with arrays. Since a hash isn't (and can't be coerced into) an array, splice can't do anything with one. Likewise, keys() and values() and each() are not defined for arrays. The two data types are very distinct. I'm not sure what you are trying to say there. It makes no sense.

      yours,
      Michael
        Your second-to-last sentence is part of what I am trying to say. Ikegami (as I understand his post) proposes that the ordering of keys intrinsic to an array isn't a difference to a hash because a hash can simulate that (and because internally there is an ordering in a hash anyway). My argument is that
        a) what perl does internally is beside the point and
        b) if we argue that a hash is as ordered as an array because we can make it so, then all arguments about differences of data types become meaningless because every data type can simulate every other data type.

      but then why does length(@a) report 5 when there is a value in $a[37865]? )

      Because the string "37866" has five characters in it. You want 0+@a, scalar(@a) or similar to get the number of entries in the array.

      Did you think you were proving Perl arrays are sparse? That's not the case. Quite the opposite, Perl usually allocates more memory than needed to allow future growth.

      $ perl -MDevel::Peek -e'$a[37865]=1; Dump \@a,1; push @a,2; Dump \@a,1 +' SV = RV(0x8176690) at 0x814ed9c SV = PVAV(0x8153c64) at 0x814f6a8 FILL = 37865 <-- Index of last used ele MAX = 37865 <-- Index of last allocated ele SV = RV(0x8176690) at 0x814f684 SV = PVAV(0x8153c64) at 0x814f6a8 FILL = 37866 <-- Index of last used ele MAX = 65531 <-- Index of last allocated ele

      (Output trimmed to the relevant.)

      So we can't compare arrays and hashes without looking at the costs to do operations or their built-in access methods.

      Indeed. They each have their strengths.

        My only excuse for the mistake is I'm just editing some old c sources and working with char arrays is damaging to the mind, I even made the same mistake there and used length() instead of strlen(). Naturally I meant scalar(@a)

        Did you think you were proving Perl arrays are sparse?

        No, I was trying to prove that a perl array is not what a computer science book would call an array. In such a book an idealized data type 'array' wouldn't have holes. And after thinking about it, it wouldn't have a push or pop operator, that would be the data type 'stack'

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://767315]
Approved by Corion
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-12-14 00:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which IDE have you been most impressed by?













    Results (70 votes). Check out past polls.