<?xml version="1.0" encoding="windows-1252"?>
<node id="1019179" title="match sequences of words based on number of characters" created="2013-02-17 13:10:22" updated="2013-02-17 13:10:22">
<type id="115">
perlquestion</type>
<author id="996037">
nicemank</author>
<data>
<field name="doctext">
&lt;p&gt; I want to extract sequences of words according to how many characters in each word.&lt;/p&gt;

&lt;p&gt;So I want to extract for instance a sequence based on the number of characters (here defined as letters of the alphabet - not punctuation, numbers, white space). &lt;/p&gt;

&lt;p&gt;For instance: I want  sequences  of 2, 4 and 3 character words - in that order only (but it could be any numbers of characters in any order I choose).&lt;/p&gt;

&lt;p&gt;Say my text is:  "xxxx yy zzzzz xxxx qqq" &lt;/p&gt;

&lt;p&gt;I should extract the sequence:  "yy xxxx qqq"&lt;/p&gt;

&lt;p&gt;and keep on doing it. So from "xxxx yy zzzzz xxxx qqq xxxx yy zzzzz xxxx qqq"&lt;/p&gt;

&lt;p&gt; I should extract&lt;/p&gt;
 
&lt;p&gt;"yy xxxx qqq  yy xxxx qqq" &lt;/p&gt;



&lt;code&gt; 
my $string = "xxxx yy zzzzz xxxx qqq";  

my @array = ( $string =~  /(\b..?\b) (\b....?\b)  (\b...?\b)/sg );

print @array;


# produces nothing. 

# I have also tried rewriting it without success:  it may 
# produce results, but not the right ones! (not the exact 
# sequence)

# also if the string were longer it should produce
# the sequence repeated: 
# "xxxx yy zzzzz xxxx qqq xxxx yy zzzzz xxxx qqq" 
# should produce "yy xxxx qqq  yy xxxx qqq" etc etc 
# until we run out of text.&lt;/code&gt; 

&lt;p&gt;

I have also tried running adaptions of remiah's code,
but without success: http://www.perlmonks.org/?node_id=996670.


The problem/task differs and I cannot adapt the code to it. Inability!

nicemank thanks in advance!

</field>
</data>
</node>
