Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re: (stephen) Hash Tutorial

by stephen (Priest)
on Jun 22, 2001 at 07:11 UTC ( #90617=note: print w/replies, xml ) Need Help??

in reply to Hash Tutorial

Just as Perl is the Swiss Army Chainsaw of languages, the hash is the Leatherman and Krazy Glue of data structures. Whenever you're thinking of the following words, think hash:
  • "look up" -- As in "For each type of pet in the pet store, the system has to look up the kind of food they eat."
  • "unique" -- As in "I want a list of each unique kind of pet we have in this pet store."
  • "check if it's in X" -- As in "I want to check that the kind of pet we enter is in our list of approved pets."
  • "named" -- As in "I want to be able to access the pet's species through something named 'species', its collar size through something named 'collar size', etc."
  • "choose" -- As in "Based on what kind of pet it is, we should choose which subroutine to run."
  • Others I haven't thought of -- As in "It sure would be nice to have an exhaustive list, but I don't."

"look up"

For example, say you've got a type of pet (dog, cat, etc.), and you want to look up what they eat. A hash is perfect for this:

%Pet_Food = ( 'dog' => 'dog chow', 'cat' => 'cat chow', 'parakeet' => 'birdseed' ); # $pet_type should be 'dog', 'cat', or 'parakeet' my $food = $Pet_Food{$pet_type};
If $pet_type is 'dog', then $food winds up being 'dog chow', etc.


Next, say you've got a list of all of the types of pets in the store. However, since it's just an inventory list (somebody went down the cages and typed "dog" for every dog, for some reason), you want to eliminate the thousand-and-one duplicates that are there. Hashes to the rescue. Hash keys are stored only once per hash, so you can get a list of unique names, eliminating the duplicates, like so:

@Pet_Types = ("dog", "cat", "dog", "parakeet", "dog", "cat"); my %type_table = (); foreach my $type ( @Pet_Types ) { $type_table{$type}++; } my @unique_types = sort keys %type_table; # @unique_types winds up with 'cat', 'dog', and 'parakeet'

"check if it's in X"

Say you've got a pet-store application where the user is supposed to enter the type of pet they're looking for. You want to be able to check to make sure they didn't type 'dgo' by accident when they meant 'dog', so we can just check against our list of valid pets:

%Valid_Pets = ( 'dog' => 1, 'cat' => 1, 'parakeet' => 1, 'lemur' => 1 ); # $pet_type should contain 'dog', 'cat', 'lemur' etc... # but might not.. we die if it doesn't exists $Valid_Pets{$pet_type} or die "I'm sorry, '$pet_type' is not a valid pet\n";


Say you have a data file like this:
name=fluffy species=rabbit weight=5 price=10.00 name=fido species=dog weight=15 price=30.15 name=gul_ducat species=cat weight=10 price=40.20
Frequently, folks reading through files like this say, "well, I want to just access this stuff by name-- don't know if we're going to start recording serial numbers or ancestry or other stuff, so I just want to have the 'name' field automatically stored in $name, species in $species, etc.'. (There's a way to do it, but it's a bad thing to do.) Then they post messages on Perlmonks asking us how to automatically call variables by name, and kick off a bunch of debate as a bunch of people say "use hashes", then one or two people tell them black-magic techniques, and it turns into a mess.

Save yourself (and the rest of us :) ) the time and trouble and use hashes whenever you want to call something by name. Read each record in the file into a hash, and access the parameters by saying things like $pet{'species'}, $pet{'price'}, and so on. You'll be that much closer to an object-oriented program, and you won't have that impossible-to-find bug when your parameter named 'x' collides with $x elsewhere in the program. You can do something like this:

while (<PET_FILE>) { my %pet = parse_pet($_); print "Name: $pet{'name'}\n"; } sub parse_pet { my ($line) = @_; my @param_pairs = split(/\s/, $line); my %params = (); foreach my $param ( @param_pairs ) { my ($name, $value) = split(/=/, $param, 2); $params{$name} = $value; } return %params; }
That way, everything about the pet is stored in a single variable, and you don't have a bunch of data running around loose like hamsters escaped from their cages. (Okay, so I'm stuck in the theme.)

Most folks would return a hash reference from the subroutine instead of the entire hash for efficiency reasons. The concept is the same:

while (<PET_FILE>) { my $pet = parse_pet($_); print "Name: $pet->{'name'}\n"; } sub parse_pet { my ($line) = @_; my @param_pairs = split(/\s/, $line); my %params = (); foreach my $param ( @param_pairs ) { my ($name, $value) = split(/=/, $param, 2); $params{$name} = $value; } return \%params; }


Kind of an advanced technique, but if you need to choose between a thousand alternate things to do based on the value of a single string, it's generally best to use a hash (unless you can use object-oriented programming and subclassing, but that's another tale.) Say for example that you want to print a different page based on the species that a customer bought. You could, of course, have a billion-and-one if/elsif statements, like so:

# Note: Bad code! No krispy kreme! if ( $pet_type eq 'dog' ) { print_dog_page(); } elsif ( $pet_type eq 'cat' ) { print_cat_page(); } elsif ( $pet_type eq 'lemur' ) { print_lemur_page(); } # .. ad nauseum
Instead, it's much better to have a hash table of pet types, plus references to subroutines to call in various situations:
%Pet_Pages = ( dog => \&print_dog_page, cat => \&print_cat_page, lemur => \&print_lemur_page, ); my $page_sub = $Pet_Pages{$pet_type} or die "Invalid pet type\n"; &$page_sub();
That way, you don't need to go rappelling down the huge list of if/thens every time you want to add or remove a pet page. It's a powerful technique, although it can be misused. (Don't use it instead of simple if/thens, for example.)

Basically, hashes give incredible flexibility. Combine this with references, and you can have hashes of hashes, and hashes of hashes of hashes (of arrays), until you have data types of whatever structure and complexity you want.

Note: Code not tested.
Update: Fixed typo in code and added hashref example.


Replies are listed 'Best First'.
Re: Re: (stephen) Hash Tutorial
by RhetTbull (Curate) on Jun 22, 2001 at 23:40 UTC
    Nice work brother stephen. I do have a small comment on your example for "Choose." You are effectively using the hash as a replacement for a switch statement (since Perl doesn't have a builtin switch). For this type of use I would recommend taking a look at Damian Conway's Switch module. It allows you to use a real switch statement with all the power of perl for the case comparisons (e.g. it's not limited to integer case values like the C switch statement). Damian stated at YAPC America::North 2001 that his switch module was going to be added to the Perl core so I would presume it's a fairly "safe" module to use. Regards,
Re: Re: Hash Tutorial
by bobione (Pilgrim) on Jun 22, 2001 at 17:53 UTC
    Very good tutorial ! Thank you.
    It's clear and illustrated.

    BobiOne KenoBi ;)

Re: Re: (stephen) Hash Tutorial
by !unlike (Beadle) on Apr 28, 2003 at 11:52 UTC

    I came across this thread, and you post, when looking for some help in doing a diff on the keys between two hashes.

    Although code didn't answer my question directly it did give me the insiration to come up with the following:

    perl -le '%foo = (1,1, 2,2, 3,3); %bar = (2,2, 3,3); print map { $_ if + !$bar{$_} } keys %foo;'

    If you run this code you'll see that only "1" get printed.

    Not the most robust piece of code ever, but I came up with it off my own back and it does a job for me. So cheers for the insiration!


    I write my Perl code like how I like my sex: fast and dirty. ;)

      What you want in your snippet there is a grep() not map() as you're filtering out items not applying transforms to them e.g
      shell> perl -le '%a = qw/1 1 2 2 3 3/; %b = qw/2 2 3 3/; \ print grep !$b{$_}, keys %a' 1
      Also note that the map() in your snippet is returning a list of three items - a 1 and two empty strings.


        Thanks broquaint
        I realised this just now and was about to post an update. But you beat me to it. :)


        I write my Perl code like how I like my sex: fast and dirty. ;)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://90617]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2017-02-25 21:18 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (369 votes). Check out past polls.