Keep It Simple, Stupid | |
PerlMonks |
Re: extract phrases of n-words lengthby ack (Deacon) |
on Jun 24, 2009 at 17:13 UTC ( [id://774457]=note: print w/replies, xml ) | Need Help?? |
Much like suaveantk and the others, I did not try to optimize but looked at how to easily handle any number of words in the phrase. I use a Max Phrase Length ($maxPhraseLen) but didn't think of what Polyglot did which to also have the ability to specify a Min Phrase Length. I think it is an easy mod to my approach to add that additional flexibility. Note that in my code I allow for phrase lengths of 1 which could be used to generate (with some approach for editing out non-interesting words like 'a', 'an', 'the', etc) keywords for searching, for example, the full test writeup that the abstract might refer to. My approach, shown in the code below, is to use an array of hashes where the array index indicates the number of words in the phrase and the hashes are just the same hashes that the OP used. The code, then, is as follows:
This yields the output that matches the OPs. But you can modify the $maxPhraseLen to be any value you want (up to the number of words in the abstract, $abstract...if you try to set the max phrase length to greater than the number of words in the abstract, the script dies with the message "Max Phrase Length exceeds number of words in abstract: aborting". That is my suggestion for a way to handle the OPs objective.
ack
Albuquerque, NM
In Section
Seekers of Perl Wisdom
|
|