http://www.perlmonks.org?node_id=1038319

DarrenSol has asked for the wisdom of the Perl Monks concerning the following question:

i'm new to actual programming languages, but i have done a lot with programming spreadsheets. please don't laugh, i already heard that's not "real" programming. after looking at a few possible languages perl seems to be my best option. i have done some simple program exercises and am encouraged that it makes sense to me.

i'm playing with arrays now and came across references to hashes. i get the concept of hashes having keys and values instead of arrays having indices and values but right now i don't see the advantage, other than using names for keys instead of numbers for indices. i tried using variables set to numbers to get values out of arrays and it seemed to work. is that different from hashes?

and i would think the program would find the values faster if the keys are ordered like the indices in arrays. are hashes just there for convenience, so i don't have to declare index variables if i don't want to get array values using just number?

Replies are listed 'Best First'.
Re: why use a hash instead of an array
by davido (Cardinal) on Jun 11, 2013 at 18:57 UTC

    Hashes provide constant time lookups by key as opposed to constant time lookups by index. Imagine if your dictionary were 0: Apple, 1: Ball, 2: Cat, 3: Dog. Find the word at index #2. That's done in constant time. Now find for me Cat. You have to either do a linear search, or a binary search to find it (linear time, or log2(n) time. You would never remember that 4200 is Quagmire, so you end up doing time-consuming searches a lot.

    Now let's organize your dictionary like this: "Apple: A fruit. Ball: A toy. Cat: A skittish animal. Dog: A loyal animal. ..... Quagmire: A difficult situation". Quick, find if Quagmire is in the dictionary. Wow, we just did it in constant time (no searching), using a hash.

    That turns out to be really, REALLY useful. We're not always just looking up words in dictionaries. We're referring to attributes in objects, we're manipulating sets, we're memoizing and caching, and we're even implementing variable systems (Perl's package globals reside in a glorified hash, internally).

    In computer science one would look at hashes as yet another datastructure. And for many problems, selecting the right data structure is 9/10ths of the solution. As you get into programming, I recommend getting a good introductory book on data structures and algorithms. I wish I had a suggestion for you. Mastering Algorithms with Perl is probably a little advanced at this point, and it's getting a little old too. But start looking at, well, what is an array, and what are its characteristics? What is a linked list, and what are its characteristics? Binary Tree, Hash, and so on.


    Dave

      While it is aimed, I think, more at the intermediate programmer, chromatic's freely downloadable Modern Perl might be a good beginners intro to Perl. And while I can't recommend Wikipedia as a beginners resource because it gets rather CS-ish rather quickly, all the topics davido mentioned and more are there and may serve as useful points of departure to more accessible on-line resources: array; linked list; binary tree; hash, or associative array; ...

      And you are always welcome at The Monastery Gates.

      jeez. i did a web search and went through a number of the pages that came up, but didn't find a reason why to use a hash that's as quick and clear as your quagmire example. thanks.
Re: why use a hash instead of an array
by space_monk (Chaplain) on Jun 11, 2013 at 18:53 UTC

    i tried using variables set to numbers to get values out of arrays and it seemed to work. is that different from hashes?
    Yes :-). A longer answer is that hashes use variables set to almost anything you like to set or get values in something that is a bit like an array.

    and i would think the program would find the values faster if the keys are ordered like the indices in arrays.
    The way values are stored in hashes is in a sort of order.

    Whilst hashing implementations vary, what normally happens is that a hashing function translates the key into an index element which refers you to a bucket, which is a short list of possible matches. A hash is often (but not always) faster than an array because you do not have to search through all the elements to find what you are looking for - you only have to check a few elements, and often only one if the hashing function is effective. Hashes are actually (sort of) an array, and what is hidden from you is how the "key" is translated into an "index" into that array.

    Most of your questions are answered by Googling for hashes and looking for the explanations. This Perl.com article is a bit dated but provides a good starting point. At the end of the article are a series of book references - if you are seriously interested in programming you should certainly get hold of the Donald Knuth book(s) temporarily or permanently, as he is the father of a huge number of computer algorithms and methods that are still in use today.

    Hashes are used for much more than trivial lookups. Another Perl.com article shows that hashes can be used imaginatively for lots of different purposes. Searching through some PerlMonks answers will also reveal just how useful hashing can be when used in a creative {and in certain PerlMonks answers, some would say nefarious :-)} fashion.

    If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)
      thanks for the advice, and the link. looks like a good source of reading material to get my head in a programming mode.
Re: why use a hash instead of an array
by Lotus1 (Vicar) on Jun 11, 2013 at 20:46 UTC

    A Certified Microsoft Excel Developer asked a Perl programmer "Why use a hash instead of an array?". The Perl programmer replied "That is like asking why use vlookup instead of a column of data in a spreadsheet.

    At that moment, the Certified Microsoft Excel Developer was enlightened.

Re: why use a hash instead of an array
by Laurent_R (Canon) on Jun 11, 2013 at 21:35 UTC

    If I had to choose between a language having arrays but no hashes, and a languages having hashes and no arrays, I would no doubt select the one with hashes (I have used at least a dozen other languages for serious purposes, and only one, besides Perl, namely Python, had hashes, at least natively, so I think I can claim that I know what it is to have hashes or not to have them). Just about everything that can be done with arrays can be done with hashes (if only by using the array subscripts as keys to the hash), but the reverse is really not true. Once you acquire a bit more experience with Perl or any other language having hashes (which sometimes have different names, they can be dictionnaries or associative arrays, or whatever), you will understand how immensely useful they are.

    Just one example from my work, something which I am due to present in a Perl conference in Europe at the end of this week. I had a program written in a database exploration language that I will call G. G is a fine and rather fast language, but it does not have hashes (and, to me, it is by far its single most important defect). The program was running on a very large database and making very complex comparisons with a relatively large parameter file. The first time we ran it, we stopped it after a couple of days when we found that it was going to take about 180 days to complete. After some optimization work, I ended up with a predicted execution time of 60 days, and with no idea for further optimization. Of course, a 60-day running time was not acceptable (in the business context, 3 to 5 days would have been not ideal but still OK, but certainly not 60 days). I decided to completely rewrite the program in Perl using four simple hashes, two hashes of hashes and one hash of hashes of hashes. The program now runs in about one hour (about 1,400 shorter execution time). To make things clear, it is not by simply replacing the previous data structure by some hashes that I obtained this huge improvement, it is because hashes enabled me to use a completely different and far better algorithm.

    Just one last thing, don't understand the above wrongly. My point is NOT that hashes are fast (they are fast, and this is important, but that is not my point). My point is that they very often allow for much better data structures and much better algorithms for many many tasks.

Re: why use a hash instead of an array
by Preceptor (Deacon) on Jun 11, 2013 at 19:08 UTC
    So you've set up a bunch of variables, that act as references to elements of an array? You could do that, but ... what advantage do you gain by doing so?

    I really wouldn't worry about efficiency - algorithm design is far more relevant than use of arrays vs. hashes.

    Otherwise? Well, hashes are just a much simpler idiom to use - clearer code is more valuable than squeezing out miniscule performance advantages.

    A lot of data _is_ structured as key/value pairs, and being able to manipulate it trivially is an advantage.

    For example - given a list of words (one perl line for simplicity) count occurences.

    With a hash:

    foreach my $word ( <STDIN> ) { $word_list{$word}++; } foreach my $word ( keys %word_list ) { print "$word : $word_list{$word}\n"; }
    You don't need to completely swap hashes for lists, but personally I'd suggest that any time you're using array indicies, you're probably doing something wrong, and making your code less readable.

    foreach my $element ( @list_of_stuff ) { print $element,"\n"; }
    Is a very useful idiom.
Re: why use a hash instead of an array
by hbm (Hermit) on Jun 12, 2013 at 02:59 UTC

    In addition... fantastic for passing named parameters - without concern for their order:

    connect(PW=>'poof!', IP=>'10.102.102.13', USER=>'bob'); sub connect { # log in to $args{IP} as "$args{USER}/$args{PW}" # note default IP can be overridden %args = ( IP => '192.168.1.100', @_ ); ... }
Re: why use a hash instead of an array
by JockoHelios (Scribe) on Jun 11, 2013 at 21:58 UTC
    You might not have heard it, but I don't think anyone here is laughing at you :) Actually, the reason you haven't heard it is that it isn't happening. That's a big plus for this site in my book.

    About your question, I couldn't add much at this point because I haven't had a need for hashes yet. Not that I know of...

    Regarding learning Perl, I think If you're going to do this you could do better than trying to pick it up piecemeal as you need parts of it. There was a thread recently about an instructional book series. There are three of them, starting with Learning Perl, then Intermediate Perl, and finally (coming out in August) Mastering Perl.

    I'd been trying to piece together programs using a sort of reference, Programming Perl. I've found that it's helpful, but tends to dash around a bit since it's not meant to be an instructional text. Which is why I picked up and have started on Learning Perl yesterday :) Once I get to chapter 6, I'll no doubt start seeing uses for those hashes...

    Dyslexics Untie !!!
      thanks for the suggestion. already got a link with some books to look into, but maybe i should start with something more basic then work up to those. like is said, i'm not a real programmer but i'm interested in progarmming. learning perl looks like a good place to start.
Re: why use a hash instead of an array
by 2teez (Vicar) on Jun 11, 2013 at 20:59 UTC

    Hi DarrenSol
    Wonderful answers has been given, but maybe this little example / question could also "say" something.
    If I give the following:

    teacher students professor students school class teacher desk table chalk borad class school teacher
    And I ask that for these:
    1. print out all the words, without repeating any, 2. print out all the repeated words only, 3. print out the number of times each words is seen and probably an illustration this with an histogram.
    How would you do these.

    Let me answer QnS 1. and 2., allow you to figure question 3 and try your hands on other solution WITHOUT USING HASH
    use strict; use warnings; my %hash; while(<DATA>){ chomp; $hash{$_}++ for split; } print join $/ => sort keys %hash; print $/,$/, join $/ => grep {$_ if $hash{$_} > 1} sort keys %hash; __DATA__ teacher students professor students school class teacher desk table chalk borad class school teacher
    Solution:
    1. borad chalk class desk professor school students table teacher
    2. class school students teacher
    That brings me to a statement I read in Programming Perl a while a go that 'Until you start thinking in terms of hashes, you aren’t really thinking in Perl.'

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
      For a person just learning perl, your example code is quite obtuse. You should at least attempt to explain how your code actually works.

      I'd do it for you, but need to leave work now...

      Update: since no one else has yet done so, and I'm back at work now, I'll try:

      So, the $/ floating all over the above post is simply a special variable holding, by default, a "new line" character, ie. "\n". The first section

      while(<DATA>){ chomp; $hash{$_}++ for split; }
      simply reads in the data, and for every word in the DATA section, increments the hash table entry for that word. It would be more clear to beginners if it were written thusly:
      while(<DATA>){ #get a line of text from the DATA area, put it in spec +ial variable $_ chomp; #remove the "new line" character from the $_ variable #create a word array from the $_ variable #split with no options splits on whitespace by default @words = split; foreach $word (@words) { $hash{$word}++; #increment the value pointed to by the $hash{ +$word} "key" } }
      Once you have the data read in, you'll have the following data structure:
      board => 1 chalk => 1 class => 2 desk => 1 professor => 1 school => 2 students => 2 table => 1 teacher => 3
      2teez then uses more advanced perl to essentially do the following:
      foreach $word (sort keys %hash) { print "$word\n"; } print "\n"; foreach $word (sort keys %hash) { if ($hash{$word} > 1) { print "$word\n"; } }
      From these two beginner friendly examples, it should be pretty easy to do 2teez's 3rd challenge which is to print out the data structure I gave above.

      -Scott

      grep {$_ if $hash{$_} > 1} sort keys %hash

      The '$_ if' is redundant since that is what grep does by default, return $_ if the clause is true. The following works the same way:

      grep { $hash{$_} > 1 } sort keys %hash
        actually not quite =)

        grep { $hash{$_} > 1 } is equivalent to map {$_ if $hash{$_} > 1}

        grep {$_ if $hash{$_} > 1} only greps if the key $_ is also true.

        Which I agree is very odd code!

        Cheers Rolf

        ( addicted to the Perl Programming Language)

Re: why use a hash instead of an array
by Happy-the-monk (Canon) on Jun 12, 2013 at 06:47 UTC

    Hashes display relations between those pieces of data they handle
    much more concise than other ways of doing it.

    They help display those relations and names in your code,
    so to another programmer,
    the code will be more concise and maybe even self-documenting.

    They will teach your mind to handle vast amounts of structured relational data
    as you programme along
    and make you brave to face big tasks lightheartedly.

    Cheers, Sören

    (hooked on the Perl Programming language)

Re: why use a hash instead of an array
by FloydATC (Deacon) on Jun 12, 2013 at 00:34 UTC
    are hashes just there for convenience

    In a nutshell, yes. Any task that lends itself well to using a hash could be solved using some other method*, but your code would be longer and more difficult to write and maintain. Most Perl programmers strive to do as little work as possible and let the computer take care of the heavy lifting :-)

    *) In fact, the CPU doesn't have a clue what a hash is so the problem is actually solved some other way even if you do use a hash

    -- FloydATC

    Time flies when you don't know what you're doing

Re: why use a hash instead of an array
by karlgoethebier (Abbot) on Jun 11, 2013 at 19:20 UTC

    It's just about how you want/like/need your data:

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Re: why use a hash instead of an array
by gurpreetsingh13 (Scribe) on Jun 12, 2013 at 11:20 UTC
    A very very simple answer to your question. Please don't laugh.

    If you have 5 sons, would you like to refer each one of them with name or with indices ?

    You must have understood that internally hashes are just like arrays- somehow named arrays - whose values are referenced using keys. Arrays are better when a similar type of operation is performed on entire list. Hashes are better when you would like to use individual members of a group, which are always better to refer via a name which you can easily remember during coding and not via an index.

    Just my 2 cents on that.

      not fair. you tell me not to laugh then throw that one-liner at me...but actually i know a guy that does that. at least i think he's planning on doing that. mr rodriquez has one kid, a boy, that he calls hose-a. i guess he'll call his next son hose-b. ok, your turn, don't laugh.

        Extremely sorry if you felt wrong with that. But friend, it was just an example.

        Ok I am not laughing. But you will understand the importance of hashes once you start using the language to its full extent. Simpler code, shorter code, understandable code, easy to use etc. etc.

        Correct me if I am wrong. I believe perl was the first language to include hashes as first-class citizens. There must have been several selling points due to which all industry-standard languages like Java, C-Sharp, Python have now something similar to hash through API or jar or dll or dictionary

Re: why use a hash instead of an array
by LanX (Saint) on Jun 12, 2013 at 12:03 UTC
    Your observations about similarities are true!

    For instance in JavaScript arrays are (officially) implemented via hashes. =)

    BUT there are certain differences in Perl.

    • arrays are ordered, hashes aren't
    • two indices with distance n mean the array has >n elements, the hash only >2
    • array lookup and memory consumption is far better for same number of elements
    These differences in nature lead to different operators, something like push or shift doesn't make much sense for hashes without order. How do you find the highest index/key of an array?

    And sometimes, e.g. when dealing with sparse matrices, arrays are simulated via hashes.

    HTH

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: why use a hash instead of an array
by aaron_baugher (Curate) on Jun 11, 2013 at 20:45 UTC

    Others have touched on many of the ways that hashes are useful. I'll just add that, when I have a list of values, often I'll make them the keys of a hash right from the start, rather than putting them in an array, because there's a good chance I'll need them in a hash at some later point.

    Aaron B.
    Available for small or large Perl jobs; see my home node.