http://www.perlmonks.org?node_id=90610

dhammaBum has asked for the wisdom of the Perl Monks concerning the following question:

Yet another newbie ..... (sigh)

To quote Larry Wall from the camel book: "Until you start thinking in terms of hashes, you aren't really thinking in Perl."

Okay, but how? I've had a look at other peoples' use of hashes and I'm missing something. Can anyone give a few examples (with brief explanation) on how to use hashes effectively, efficiently and in a creatively satisfying manner?

thanks.

May all beings (incl perl programmers) be happy.

Replies are listed 'Best First'.
Re: (stephen) Hash Tutorial
by stephen (Priest) on Jun 22, 2001 at 07:11 UTC
    Just as Perl is the Swiss Army Chainsaw of languages, the hash is the Leatherman and Krazy Glue of data structures. Whenever you're thinking of the following words, think hash:
    • "look up" -- As in "For each type of pet in the pet store, the system has to look up the kind of food they eat."
    • "unique" -- As in "I want a list of each unique kind of pet we have in this pet store."
    • "check if it's in X" -- As in "I want to check that the kind of pet we enter is in our list of approved pets."
    • "named" -- As in "I want to be able to access the pet's species through something named 'species', its collar size through something named 'collar size', etc."
    • "choose" -- As in "Based on what kind of pet it is, we should choose which subroutine to run."
    • Others I haven't thought of -- As in "It sure would be nice to have an exhaustive list, but I don't."

    "look up"

    For example, say you've got a type of pet (dog, cat, etc.), and you want to look up what they eat. A hash is perfect for this:

    %Pet_Food = ( 'dog' => 'dog chow', 'cat' => 'cat chow', 'parakeet' => 'birdseed' ); # $pet_type should be 'dog', 'cat', or 'parakeet' my $food = $Pet_Food{$pet_type};
    If $pet_type is 'dog', then $food winds up being 'dog chow', etc.

    "unique"

    Next, say you've got a list of all of the types of pets in the store. However, since it's just an inventory list (somebody went down the cages and typed "dog" for every dog, for some reason), you want to eliminate the thousand-and-one duplicates that are there. Hashes to the rescue. Hash keys are stored only once per hash, so you can get a list of unique names, eliminating the duplicates, like so:

    @Pet_Types = ("dog", "cat", "dog", "parakeet", "dog", "cat"); my %type_table = (); foreach my $type ( @Pet_Types ) { $type_table{$type}++; } my @unique_types = sort keys %type_table; # @unique_types winds up with 'cat', 'dog', and 'parakeet'

    "check if it's in X"

    Say you've got a pet-store application where the user is supposed to enter the type of pet they're looking for. You want to be able to check to make sure they didn't type 'dgo' by accident when they meant 'dog', so we can just check against our list of valid pets:

    %Valid_Pets = ( 'dog' => 1, 'cat' => 1, 'parakeet' => 1, 'lemur' => 1 ); # $pet_type should contain 'dog', 'cat', 'lemur' etc... # but might not.. we die if it doesn't exists $Valid_Pets{$pet_type} or die "I'm sorry, '$pet_type' is not a valid pet\n";

    "named"

    Say you have a data file like this:
    name=fluffy species=rabbit weight=5 price=10.00 name=fido species=dog weight=15 price=30.15 name=gul_ducat species=cat weight=10 price=40.20
    Frequently, folks reading through files like this say, "well, I want to just access this stuff by name-- don't know if we're going to start recording serial numbers or ancestry or other stuff, so I just want to have the 'name' field automatically stored in $name, species in $species, etc.'. (There's a way to do it, but it's a bad thing to do.) Then they post messages on Perlmonks asking us how to automatically call variables by name, and kick off a bunch of debate as a bunch of people say "use hashes", then one or two people tell them black-magic techniques, and it turns into a mess.

    Save yourself (and the rest of us :) ) the time and trouble and use hashes whenever you want to call something by name. Read each record in the file into a hash, and access the parameters by saying things like $pet{'species'}, $pet{'price'}, and so on. You'll be that much closer to an object-oriented program, and you won't have that impossible-to-find bug when your parameter named 'x' collides with $x elsewhere in the program. You can do something like this:

    while (<PET_FILE>) { my %pet = parse_pet($_); print "Name: $pet{'name'}\n"; } sub parse_pet { my ($line) = @_; my @param_pairs = split(/\s/, $line); my %params = (); foreach my $param ( @param_pairs ) { my ($name, $value) = split(/=/, $param, 2); $params{$name} = $value; } return %params; }
    That way, everything about the pet is stored in a single variable, and you don't have a bunch of data running around loose like hamsters escaped from their cages. (Okay, so I'm stuck in the theme.)

    Most folks would return a hash reference from the subroutine instead of the entire hash for efficiency reasons. The concept is the same:

    while (<PET_FILE>) { my $pet = parse_pet($_); print "Name: $pet->{'name'}\n"; } sub parse_pet { my ($line) = @_; my @param_pairs = split(/\s/, $line); my %params = (); foreach my $param ( @param_pairs ) { my ($name, $value) = split(/=/, $param, 2); $params{$name} = $value; } return \%params; }

    "choose"

    Kind of an advanced technique, but if you need to choose between a thousand alternate things to do based on the value of a single string, it's generally best to use a hash (unless you can use object-oriented programming and subclassing, but that's another tale.) Say for example that you want to print a different page based on the species that a customer bought. You could, of course, have a billion-and-one if/elsif statements, like so:

    # Note: Bad code! No krispy kreme! if ( $pet_type eq 'dog' ) { print_dog_page(); } elsif ( $pet_type eq 'cat' ) { print_cat_page(); } elsif ( $pet_type eq 'lemur' ) { print_lemur_page(); } # .. ad nauseum
    Instead, it's much better to have a hash table of pet types, plus references to subroutines to call in various situations:
    %Pet_Pages = ( dog => \&print_dog_page, cat => \&print_cat_page, lemur => \&print_lemur_page, ); my $page_sub = $Pet_Pages{$pet_type} or die "Invalid pet type\n"; &$page_sub();
    That way, you don't need to go rappelling down the huge list of if/thens every time you want to add or remove a pet page. It's a powerful technique, although it can be misused. (Don't use it instead of simple if/thens, for example.)

    Basically, hashes give incredible flexibility. Combine this with references, and you can have hashes of hashes, and hashes of hashes of hashes (of arrays), until you have data types of whatever structure and complexity you want.

    Note: Code not tested.
    Update: Fixed typo in code and added hashref example.

    stephen

      Nice work brother stephen. I do have a small comment on your example for "Choose." You are effectively using the hash as a replacement for a switch statement (since Perl doesn't have a builtin switch). For this type of use I would recommend taking a look at Damian Conway's Switch module. It allows you to use a real switch statement with all the power of perl for the case comparisons (e.g. it's not limited to integer case values like the C switch statement). Damian stated at YAPC America::North 2001 that his switch module was going to be added to the Perl core so I would presume it's a fairly "safe" module to use. Regards,
      Rhet
      Very good tutorial ! Thank you.
      It's clear and illustrated.

      BobiOne KenoBi ;)

      stephen,
      I came across this thread, and you post, when looking for some help in doing a diff on the keys between two hashes.

      Although code didn't answer my question directly it did give me the insiration to come up with the following:

      perl -le '%foo = (1,1, 2,2, 3,3); %bar = (2,2, 3,3); print map { $_ if + !$bar{$_} } keys %foo;'

      If you run this code you'll see that only "1" get printed.

      Not the most robust piece of code ever, but I came up with it off my own back and it does a job for me. So cheers for the insiration!

      !unlike

      I write my Perl code like how I like my sex: fast and dirty. ;)

        What you want in your snippet there is a grep() not map() as you're filtering out items not applying transforms to them e.g
        shell> perl -le '%a = qw/1 1 2 2 3 3/; %b = qw/2 2 3 3/; \ print grep !$b{$_}, keys %a' 1
        Also note that the map() in your snippet is returning a list of three items - a 1 and two empty strings.
        HTH

        _________
        broquaint

Re (tilly) 1: Hash Tutorial
by tilly (Archbishop) on Jun 22, 2001 at 07:03 UTC
    The fundamental key IMHO is this. As long as you are doing a lot of thinking involving positional logic, you are not being very Perlish. What this means is that you use implicit looping operations (foreach over a list rather than a C-style for, map, grep) and use hashes to do things by name. To get a good sense I can suggest nothing better than getting the Cookbook and looking through the examples in there.

    Short of that I highly recommend the section in Camel II (possibly in Camel III as well?) in chapter 1 on how Perl data structures map to linguistic operations. The linguistic version of a hashlookup is the word "of". This informs us how we name a hash. If the hash gives us the address of the person, we should say $address{$person}. If it is possible to get an address in several ways we might be more verbose and say $address_by_person{$person}. In either case "talk your way" through the problem in English and wherever you say "of" or an equivalent, that is a good sign that you want a hash.

    That said, here are some standard uses that I have for them:

    1. The obvious lookup. $address{$person};
    2. Keep track of existing things I have dealt with. Where I might say, "If I haven't seen this one yet (BTW mark it seen) then..." I might write in code: if (not $is_seen{$case}++) { # etc
    3. Named function arguments. Rather than having to remember a set of 6 arguments in order, and what the defaults are if you need to set the 6'th but not touch the 3'rd, 4'th, or 5'th, just use a hash:
      sub my_function { my %opts = @_; # etc
      and it is easier to remember how to call the function, and easier to add more useful options later.
    4. Structs. Anywhere you would have used a pointer to a struct in C, you can use a reference to a hash in Perl. The notation is even similar since the arrow is used in Perl for dereferencing. This is why you see a lot of OO code in Perl with things like: my $name = $self->{name};
    5. List special cases. You can easily enumerate special cases in a hash. If you are willing to create anonymous functions for them (with sub) then you can easily create a nice catch-all special case:
      if (exists $handler{$case}) { $handler{$case}->(); # Handle the special case }
      The actual definitions of the special cases are placed elsewhere. Lest this idea just sound weird, you may want to read Why I like functional programming. (It helps to read the *last* function first - scrub_input - and understand what it does. It may also help to check out what pos does.) OK, perhaps you don't want to try that. Yet.
    And so it goes. Whenever it makes sense to refer to things by name, it makes sense to use hashes.

    UPDATE
    Forgot a ++ on $is_seen{$case}. I thought it, and verbalized it in English, but when I went to write the Perl at last I apparently forgot it at the last moment...

      The fundamental key IMHO is this. As long as you are doing a lot of thinking involving positional logic, you are not being very Perlish.

      I just wanted to qualify this statement with a slightly tangential one of my own -- as a newbie, when you discover the power of hashes in Perl they become intoxicating. Oh, this could be a hash. Oh, and so can this. And this. And this. And...

      tilly's very erudite description of the hash glosses over this slightly, but only because, as a saint, he's probably forgotten how we men and women in the trenches think.

      It is important to remember that hashes have higher overhead and are slower to access than an array, but, in contrast, accessing a hash is essentially a constant time operation -- meaning, that as long as I know what item I want, it takes the same amount of time to access the 1st element of the hash as the nth.

      There is a lot of overlap between arrays and hashes conceptually in Perl (and even more so under the hood), and you need to keep in mind how you are intending to use your data structure and the kind of data it will hold. For instance, a few general examples:

      • When order is important for adding/accessing: array
      • When keyword access is important for adding/accessing: hash
      • When numeric values are your keys: dense data = array, sparse data = hash

      The rules/ideas go on for many lines from there, but I guess what I'm getting at, is that Perl makes it possible to see the world through hash-tinted glasses <pun intended>, but always keep in mind that the lowly array is an equally adept tool for many jobs.

Re: Hash Tutorial
by Superbabeteam (Initiate) on Jun 23, 2001 at 07:39 UTC
    from the learning perl book (may not be exact):
    %words ( "fred" => "camel", "wilma" => "alpaca", "barney" => "llama" ); print $words{"fred"}; #prints camel $aname = "wilma"; print $words{$aname}; #prints alpaca

    hope that helped