http://www.perlmonks.org?node_id=89124

june_bo has asked for the wisdom of the Perl Monks concerning the following question:

Writing a program for my compilers class.

Using these input files

input file 1:
A -> Aa | B | C | EMPTY
B -> Ab | C
C -> m

input file 2:
A -> BCj | gDB
B -> bCDE | EMPTY
C -> DaB | ma | DB
D -> dD | EMPTY
E -> gAf | i

input file 3:
E -> E+T | T
T -> T*F | F
F -> (E) | a

Here is the code that is giving me trouble

while(<INFILE>){ chomp(); m/([A-Z]) -> (.*)/; # take the line, put the righthand side # into $1, put lefthand side into $2 $GRAMMAR{$1} = $2; # enter into hash table GRAMMAR } print "gonna print some info: \n"; while(($k, $v) = each %GRAMMAR) { print "$k\n"; }

Input file 1 and 2 create hash tables with the keys in the order of the input file. The output of #1, for example, is:

A B C
But the output of #3 is:
F T E

Can anyone help me figure out why?

Thanks, -tl

Replies are listed 'Best First'.
Re: problem with hash keys
by btrott (Parson) on Jun 17, 2001 at 07:13 UTC
    Hashes do not preserve insertion order. In other words, it is simply a fluke that your examples #1 and #2 print out the hash keys in the correct order; they are the exception rather than the rule.

    If you just want the keys sorted alphabetically, use

    for my $k (sort keys %GRAMMAR) {
    But you may run into situations where you really do want to preserve insertion order, and it's not just a matter of sorting alphabetically.

    If you want to preserve insertion order and still use a hash, take a look at Tie::IxHash. Or, if you're not wedded to a hash, you could always use an array, and push array refs onto your list of grammars:

    push @GRAMMAR, [ $1, $2 ];
    And then when iterating:
    for my $rec (@GRAMMAR) { print $rec->[0], "\n"; }
    BTW, when you do a regex match w/ capturing parens, you really should check whether the regex successfully matches:
    next unless /([A-Z]) -> (.*)/; $GRAMMAR{$1} = $2;
    That way, if the regex fails--ie. the line you're looking at doesn't match that regex--you'll just skip that line. You don't want to use $1 and $2 if the match was unsuccessful, because there's no telling what they could contain. :)
      Thanks for all the good advice.
      I need to keep the order for cosmetic reasons; the user will expect to see the end results (a bunch of sets) in the same order he entered them.
      I have to use hashes because I do (what I think are) wonderful and amazing things with them later in the program.

      Thanks again to everyone.
      -tl

        Then you can additionally store for each hash the order in which its keys are in:
        # ... push @grammar_keys, $1 unless exists $grammar{$1}; $grammar{$1} = $2; # ... print "$k\n" for my $k (@grammar_keys);
        (I've also taken the liberty of using lowercase variable names; you should probably use all-uppercase names only for special global variables).

        Or you can use a module like Tie::IxHash, which implements hashes with ordered keys just like you want.

Re: problem with hash keys
by wog (Curate) on Jun 17, 2001 at 07:13 UTC
    The problem you are having is that hashes are not kept in any particular order. It was lucky that you happened to get values out of the hash in the alphabetical order file #1. But, because of the nature of hash tables, this is quite often not true.

    To fix your problem you can use a foreach my $k (sort keys %GRAMMAR) instead of your while loop assuming you want alphabetical order.

Re: problem with hash keys
by marcink (Monk) on Jun 17, 2001 at 07:13 UTC
    Actually, the fact that you get keys from #1 and #2 in correct order is pure coincidence -- a hash does not remember sequence of assignements. If you want to keep that information you'll have to create an array for keeping key order.

    Another option is to use the sort function -- in your examples you use alphabetical order, so if that is what you want you can use something like this:

    foreach my $k ( sort keys %GRAMMAR ) { print "$k\n"; }


    -mk