|Perl: the Markov chain saw|
extract phrases of n-words lengthby arun_kom (Monk)
|on Jun 24, 2009 at 14:54 UTC||Need Help??|
arun_kom has asked for the
wisdom of the Perl Monks concerning the following question:
I would like to extract from an abstract, all unique phrases of two-, three- and four-words length in seperate groups. Order of the words in the phrase must be preserved but the order of the phrases in the output need not be. Using a hash achieves this in addition to removing duplicates if any. Please see below for my solution that achieves what i want.
However, I would like to know what is a better way as this approach will be cumbersome to scale up for phrases of n-words length as n gets larger.
Thanks in advance.
Sample output of all four-word phrases:
general-purpose, interpreted, dynamic programming
a high-level, general-purpose, interpreted,
Perl is a high-level,
interpreted, dynamic programming language.
is a high-level, general-purpose,
high-level, general-purpose, interpreted, dynamic