Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: non-scalar hash key

by citromatik (Curate)
on Jun 17, 2009 at 10:23 UTC ( #772325=note: print w/ replies, xml ) Need Help??


in reply to non-scalar hash key

Sounds like an XY problem (a design flaw). If you explain why you need it, probably a better strategy could be suggested

citromatik


Comment on Re: non-scalar hash key
Re^2: non-scalar hash key
by kdejonghe (Novice) on Jun 17, 2009 at 10:58 UTC
    Possibly yes. I'll try to explain it. I'm not sure how to do this without flooding you with details. But I'll take my chance.

    I have a text corpus. The text corpus consists of sentences. A sentence consists of tokens. Each token has 3 components: a word, a correct tag, and an initial tag. All token elements are encoded in integers.

    The purpose is to build rules for the whole corpus that change the initial tag into the correct tag. See http://en.wikipedia.org/wiki/Brill_tagger.

    The hash I want to build has predicates as keys , and as values the locations in the corpus.

    A predicate is a sequence of integers that can be applied to a particular location in the corpus. That is to say: if the sequence of integers can be matched with the sequence of integers at a particular location in the corpus, then I create an entry in the hash with as key the predicate and as value the location.

    Currently the predicate is transformed into a string (join ' ', @$predicate), and the string is used as the key for the hash. The problem is that lower in the program I need to split this key back into its elements, to see if it matches elsewhere in the corpus.

    The process can be very time (and memory) consuming, so I'm trying to speed it up a little.
      Maybe the values in your hash could include the predicates in the original un-stringified format. That way you can have strings as keys, but lists still available when you need them later on in your program.

      So your data structure might look like:

      %allData = ( "1 2 3" => { location => "behind the sofa", predicates => [1, 2, 3], }, # More tokens here... );

      --
      use JAPH;
      print JAPH::asString();

        I'm afraid I do not have the luxury of adding another element to the hash values. We're dealing with a massive amount of data.

      Does it help to have a HoHoHoA? That is, use the word (or the token for the word) as a top level key, the initial tag and the second level key and the correct tag as the third level key with the value either the location (if there is just one) or an array of locations (if there may be many).

      Of course some sample code and data would be even better than a description don't you think?


      True laziness is hard work
        Here's what the current hash looks like (fragment). The 's' value points to a hash of locations.
        $VAR1 = { '0 0 20 0 13 0 0 0 5' => { 's' => { '0,7' => 1 } }, '0 0 18 1 0' => { 's' => { '0,6' => 1 } }, '2 0 8 0 0 0 2' => { 's' => { '0,4' => 1 } }, '0 42314 18 1 0' => { 's' => { '0,6' => 1 } }, ...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://772325]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2014-09-01 21:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (17 votes), past polls