Re^2: non-scalar hash key

Possibly yes. I'll try to explain it. I'm not sure how to do this without flooding you with details. But I'll take my chance.

I have a text corpus. The text corpus consists of sentences. A sentence consists of tokens. Each token has 3 components: a word, a correct tag, and an initial tag. All token elements are encoded in integers.

The purpose is to build rules for the whole corpus that change the initial tag into the correct tag. See http://en.wikipedia.org/wiki/Brill_tagger.

The hash I want to build has predicates as keys , and as values the locations in the corpus.

A predicate is a sequence of integers that can be applied to a particular location in the corpus. That is to say: if the sequence of integers can be matched with the sequence of integers at a particular location in the corpus, then I create an entry in the hash with as key the predicate and as value the location.

Currently the predicate is transformed into a string (join ' ', @$predicate), and the string is used as the key for the hash. The problem is that lower in the program I need to split this key back into its elements, to see if it matches elsewhere in the corpus.

The process can be very time (and memory) consuming, so I'm trying to speed it up a little.

Comment on Re^2: non-scalar hash key

Replies are listed 'Best First'.
Re^3: non-scalar hash key by wol (Hermit) on Jun 17, 2009 at 11:13 UTC
Maybe the values in your hash could include the predicates in the original un-stringified format. That way you can have strings as keys, but lists still available when you need them later on in your program. So your data structure might look like: `%allData = ( "1 2 3" => { location => "behind the sofa", predicates => [1, 2, 3], }, # More tokens here... );` [download] -- use JAPH; print JAPH::asString();	[reply] [d/l]
Re^4: non-scalar hash key by kdejonghe (Novice) on Jun 17, 2009 at 11:33 UTC
I'm afraid I do not have the luxury of adding another element to the hash values. We're dealing with a massive amount of data.	[reply]
Re^5: non-scalar hash key by wol (Hermit) on Jun 17, 2009 at 12:32 UTC
I'm not suggesting that you add more data to your input files, if that's what you thought I meant. I'm suggesting that you extend the way the data is held in perl hashes/arrays/etc in memory when your program is running. And if you're not permitted to change how your program represents its data internally, then I'm not sure there's anything that anyone can do for you :-) -- use JAPH; print JAPH::asString();	[reply]
Re^6: non-scalar hash key by kdejonghe (Novice) on Jun 17, 2009 at 13:00 UTC
Re^3: non-scalar hash key by GrandFather (Saint) on Jun 17, 2009 at 11:16 UTC
Does it help to have a HoHoHoA? That is, use the word (or the token for the word) as a top level key, the initial tag and the second level key and the correct tag as the third level key with the value either the location (if there is just one) or an array of locations (if there may be many). Of course some sample code and data would be even better than a description don't you think? True laziness is hard work	[reply]
Re^4: non-scalar hash key by kdejonghe (Novice) on Jun 17, 2009 at 11:54 UTC
Here's what the current hash looks like (fragment). The 's' value points to a hash of locations. `$VAR1 = { '0 0 20 0 13 0 0 0 5' => { 's' => { '0,7' => 1 } }, '0 0 18 1 0' => { 's' => { '0,6' => 1 } }, '2 0 8 0 0 0 2' => { 's' => { '0,4' => 1 } }, '0 42314 18 1 0' => { 's' => { '0,6' => 1 } }, ...` [download]	[reply] [d/l]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks