|Perl: the Markov chain saw|
match sequences of words based on number of charactersby nicemank (Novice)
|on Feb 17, 2013 at 18:10 UTC||Need Help??|
nicemank has asked for the
wisdom of the Perl Monks concerning the following question:
I want to extract sequences of words according to how many characters in each word.
So I want to extract for instance a sequence based on the number of characters (here defined as letters of the alphabet - not punctuation, numbers, white space).
For instance: I want sequences of 2, 4 and 3 character words - in that order only (but it could be any numbers of characters in any order I choose).
Say my text is: "xxxx yy zzzzz xxxx qqq"
I should extract the sequence: "yy xxxx qqq"
and keep on doing it. So from "xxxx yy zzzzz xxxx qqq xxxx yy zzzzz xxxx qqq"
I should extract
"yy xxxx qqq yy xxxx qqq"
I have also tried running adaptions of remiah's code, but without success: http://www.perlmonks.org/?node_id=996670. The problem/task differs and I cannot adapt the code to it. Inability! nicemank thanks in advance!