Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Re^2: non-scalar hash key

by kdejonghe (Novice)
on Jun 17, 2009 at 10:58 UTC ( #772331=note: print w/replies, xml ) Need Help??

in reply to Re: non-scalar hash key
in thread non-scalar hash key

Possibly yes. I'll try to explain it. I'm not sure how to do this without flooding you with details. But I'll take my chance.

I have a text corpus. The text corpus consists of sentences. A sentence consists of tokens. Each token has 3 components: a word, a correct tag, and an initial tag. All token elements are encoded in integers.

The purpose is to build rules for the whole corpus that change the initial tag into the correct tag. See

The hash I want to build has predicates as keys , and as values the locations in the corpus.

A predicate is a sequence of integers that can be applied to a particular location in the corpus. That is to say: if the sequence of integers can be matched with the sequence of integers at a particular location in the corpus, then I create an entry in the hash with as key the predicate and as value the location.

Currently the predicate is transformed into a string (join ' ', @$predicate), and the string is used as the key for the hash. The problem is that lower in the program I need to split this key back into its elements, to see if it matches elsewhere in the corpus.

The process can be very time (and memory) consuming, so I'm trying to speed it up a little.

Replies are listed 'Best First'.
Re^3: non-scalar hash key
by wol (Hermit) on Jun 17, 2009 at 11:13 UTC
    Maybe the values in your hash could include the predicates in the original un-stringified format. That way you can have strings as keys, but lists still available when you need them later on in your program.

    So your data structure might look like:

    %allData = ( "1 2 3" => { location => "behind the sofa", predicates => [1, 2, 3], }, # More tokens here... );

    use JAPH;
    print JAPH::asString();

      I'm afraid I do not have the luxury of adding another element to the hash values. We're dealing with a massive amount of data.
        I'm not suggesting that you add more data to your input files, if that's what you thought I meant.

        I'm suggesting that you extend the way the data is held in perl hashes/arrays/etc in memory when your program is running. And if you're not permitted to change how your program represents its data internally, then I'm not sure there's anything that anyone can do for you :-)

        use JAPH;
        print JAPH::asString();

Re^3: non-scalar hash key
by GrandFather (Sage) on Jun 17, 2009 at 11:16 UTC

    Does it help to have a HoHoHoA? That is, use the word (or the token for the word) as a top level key, the initial tag and the second level key and the correct tag as the third level key with the value either the location (if there is just one) or an array of locations (if there may be many).

    Of course some sample code and data would be even better than a description don't you think?

    True laziness is hard work
      Here's what the current hash looks like (fragment). The 's' value points to a hash of locations.
      $VAR1 = { '0 0 20 0 13 0 0 0 5' => { 's' => { '0,7' => 1 } }, '0 0 18 1 0' => { 's' => { '0,6' => 1 } }, '2 0 8 0 0 0 2' => { 's' => { '0,4' => 1 } }, '0 42314 18 1 0' => { 's' => { '0,6' => 1 } }, ...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://772331]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2019-10-17 05:02 GMT
Find Nodes?
    Voting Booth?