Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

non-scalar hash key

by kdejonghe (Novice)
on Jun 17, 2009 at 09:51 UTC ( #772309=perlquestion: print w/replies, xml ) Need Help??
kdejonghe has asked for the wisdom of the Perl Monks concerning the following question:

Hi, The key to the hash I have in mind is an array of integers. Hashes in perl take only scalars as keys. I could join the integer array into a string, and use that as the key, but I'm looking for something faster. (The details of why I want this are quite complex, but I'll be glad to provide them if interested.) Anyone ? Thanks.

Replies are listed 'Best First'.
Re: non-scalar hash key
by Arunbear (Parson) on Jun 17, 2009 at 10:59 UTC
    pack is a bit faster than join:
    use strict; use warnings; use Benchmark qw(:all) ; my @foo = qw/1 2 3 4 5 6/; my %bar = (pack('b', @foo) => 'elephants'); my @baz = qw/1 2 3 4 5 6/; print $bar{pack('b', @baz)} . "\n"; cmpthese(-5, { 'join' => sub { join('', qw/1 2 3 4 5 6/) }, 'pack' => sub { pack('b', qw/1 2 3 4 5 6/) }, });
    prints:
    % perl test.pl elephants Rate join pack join 1002841/s -- -53% pack 2119814/s 111% -- %
      Nice but not in the gold
      use strict; use warnings; use Benchmark qw(:all) ; my @foo = qw/1 2 3 4 5 6/; my %bar = (pack('b', @foo) => 'elephants'); my @baz = qw/1 a 4 6 7/; print $bar{pack('b', @baz)} . "\n";
      Also outputs "elephants".
      It is indeed. Thank you very much. I pack as follows though:
      my $p = pack('L*', @$pa);
      because it's an integer array.
      I also think it uses less memory. But that I still have test.
      Thanks again. Seems like this is the only thing to do apart from implementing my own hash algorithm.
Re: non-scalar hash key
by citromatik (Curate) on Jun 17, 2009 at 10:23 UTC

    Sounds like an XY problem (a design flaw). If you explain why you need it, probably a better strategy could be suggested

    citromatik

      Possibly yes. I'll try to explain it. I'm not sure how to do this without flooding you with details. But I'll take my chance.

      I have a text corpus. The text corpus consists of sentences. A sentence consists of tokens. Each token has 3 components: a word, a correct tag, and an initial tag. All token elements are encoded in integers.

      The purpose is to build rules for the whole corpus that change the initial tag into the correct tag. See http://en.wikipedia.org/wiki/Brill_tagger.

      The hash I want to build has predicates as keys , and as values the locations in the corpus.

      A predicate is a sequence of integers that can be applied to a particular location in the corpus. That is to say: if the sequence of integers can be matched with the sequence of integers at a particular location in the corpus, then I create an entry in the hash with as key the predicate and as value the location.

      Currently the predicate is transformed into a string (join ' ', @$predicate), and the string is used as the key for the hash. The problem is that lower in the program I need to split this key back into its elements, to see if it matches elsewhere in the corpus.

      The process can be very time (and memory) consuming, so I'm trying to speed it up a little.
        Maybe the values in your hash could include the predicates in the original un-stringified format. That way you can have strings as keys, but lists still available when you need them later on in your program.

        So your data structure might look like:

        %allData = ( "1 2 3" => { location => "behind the sofa", predicates => [1, 2, 3], }, # More tokens here... );

        --
        use JAPH;
        print JAPH::asString();

        Does it help to have a HoHoHoA? That is, use the word (or the token for the word) as a top level key, the initial tag and the second level key and the correct tag as the third level key with the value either the location (if there is just one) or an array of locations (if there may be many).

        Of course some sample code and data would be even better than a description don't you think?


        True laziness is hard work
Re: non-scalar hash key
by McDarren (Abbot) on Jun 17, 2009 at 09:55 UTC
    Sounds like you need a HoA (hash of arrays).

    Check out the PDSC

    Cheers,
    Darren :)

      I think he want array of integers to become key, not value
        exactly
Re: non-scalar hash key
by ikegami (Pope) on Jun 17, 2009 at 14:34 UTC
    I could join the integer array into a string, and use that as the key, but I'm looking for something faster.

    That makes no sense. If you want the key to represent all the integers in the array, you will have to join the integer into a key somehow. You can't possibly go faster than doing what you want to do and still have it do what you want it to do.

    The only question is what's the fastest way to join the keys.

Re: non-scalar hash key
by doug (Pilgrim) on Jun 17, 2009 at 15:08 UTC

    Have you played with the perl4 era $hash{$x,$y,$z} syntax? That takes those scalars and builds a hash key for you that is the three values joined with a magic char (FS (28 decimal) by default).

    You could do much of this on your own, but I'm hoping that since Perl supports it, it is faster. Dunno. You'd have to test.

    - doug
Re: non-scalar hash key
by Anonymous Monk on Jun 17, 2009 at 09:55 UTC
    I could join the integer array into a string, and use that as the key, but I'm looking for something faster.

    I don't think it exist.

Re: non-scalar hash key
by kdejonghe (Novice) on Jun 18, 2009 at 20:31 UTC
    I've been toying a bit with C++. That language is probably better suited for what I want. Here's the code (for gcc):
    #include <iostream> #include <ext/hash_map> #include <vector> using namespace std; /* * hash function, borrowed from * http://burtleburtle.net/bob/c/lookup3.c */ #define rot(x,k) (((x)<<(k)) | ((x)>>(32-(k)))) #define mix(a,b,c) \ { \ a -= c; a ^= rot(c, 4); c += b; \ b -= a; b ^= rot(a, 6); a += c; \ c -= b; c ^= rot(b, 8); b += a; \ a -= c; a ^= rot(c,16); c += b; \ b -= a; b ^= rot(a,19); a += c; \ c -= b; c ^= rot(b, 4); b += a; \ } #define final(a,b,c) \ { \ c ^= b; c -= rot(b,14); \ a ^= c; a -= rot(c,11); \ b ^= a; b -= rot(a,25); \ c ^= b; c -= rot(b,16); \ a ^= c; a -= rot(c,4); \ b ^= a; b -= rot(a,14); \ c ^= b; c -= rot(b,24); \ } struct hashword { size_t operator()(const vector<uint32_t>* k) const { uint32_t a, b, c; size_t length = (*k).size(); uint32_t initval = 0; unsigned int ki = 0; /* Set up the internal state */ a = b = c = 0xdeadbeef + (((uint32_t) length) << 2) + initval; /*------------------------------ handle most of the key */ while (length > 3) { a += (*k)[ki++]; b += (*k)[ki++]; c += (*k)[ki++]; mix(a,b,c); length -= 3; } ki = (*k).size() - 1; /*-------------------------------- handle the last 3 uint32_t's */ switch (length) /* all the case statements fall through */ { case 3: c += (*k)[ki--]; case 2: b += (*k)[ki--]; case 1: a += (*k)[ki]; final(a,b,c) ; case 0: /* case 0: nothing left to add */ break; } /*-------------------------------------- report the result */ return c; } }; /* compare 2 vectors */ struct eqintarr { bool operator()(const vector<uint32_t>* s1, const vector<uint32_t>* +s2) const { return (*s1 == *s2); } }; int main() { __gnu_cxx::hash_map<const vector<uint32_t>*, int, hashword, eqintarr +> preds; uint32_t a[] = {1,2,3,4,5}; uint32_t b[] = {0,34567,3,5,256001}; vector<uint32_t> k1 (a, a + sizeof(a) / sizeof(uint32_t) ); vector<uint32_t> k2 (k1); vector<uint32_t> k3 (b, b + sizeof(b) / sizeof(uint32_t) ); vector<uint32_t> k4 (k3); preds[&k1] = 7; preds[&k3] = 100; cout << "k1 -> " << preds[&k2] << endl; cout << "k2 -> " << preds[&k4] << endl; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://772309]
Approved by Corion
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2017-12-13 05:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (345 votes). Check out past polls.

    Notices?