Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: match sequences of words based on number of characters

by AnomalousMonk (Abbot)
on Feb 17, 2013 at 19:56 UTC ( #1019201=note: print w/ replies, xml ) Need Help??


in reply to match sequences of words based on number of characters

Another way:

>perl -wMstrict -le "my $s = 'aaaa bb ccccc ddd eeeeeee ffff gg hhhhh iii jjjjjjj'; ;; for my $ar ([2, 5, 3], [3, 7, 4], [4, 2],) { my $rx = rxg(@$ar); print $rx; my @groups = $s =~ m{ ($rx) }xmsg; print qq{'$_' } for @groups; } ;; sub rxg { my ($rx) = map qr{ \b $_ \b }xms, join ' \s+ ', map qq{\\w{$_}}, @_ ; ;; return $rx; } " (?^msx: \b \w{2} \s+ \w{5} \s+ \w{3} \b ) 'bb ccccc ddd' 'gg hhhhh iii' (?^msx: \b \w{3} \s+ \w{7} \s+ \w{4} \b ) 'ddd eeeeeee ffff' (?^msx: \b \w{4} \s+ \w{2} \b ) 'aaaa bb' 'ffff gg'


Comment on Re: match sequences of words based on number of characters
Download Code
Replies are listed 'Best First'.
Re^2: match sequences of words based on number of characters
by nicemank (Novice) on Feb 17, 2013 at 21:17 UTC
    Thanks for your kind efforts here. But I have tried running this but it produces an error: "Undefined subroutine &main::rxg called at whatnot.pl line 9". You might be assuming something; but have I missed it....? nicemank.
      ... "Undefined subroutine &main::rxg called at whatnot.pl line 9".

      I don't know what's in whatnot.pl, but somewhere in there must be the subroutine definition for  rxg() that I included in my post above; please take another look.

Re^2: match sequences of words based on number of characters
by frozenwithjoy (Curate) on Feb 17, 2013 at 23:56 UTC
    Based on the examples, I don't believe that nicemank is requiring captured words to be adjacent. Maybe change \s+ to some non-greedy length of characters.
      Based on the examples, I don't believe that nicemank is requiring captured words to be adjacent.

      Hmmm... After taking another look at the OP, I think you may be right. In which case:

      >perl -wMstrict -le "my $s = 'xxxx yy zzzzz xxxx qqq xxxx yy zzzzz xxxx qqq'; ;; for my $ar ([2, 4, 3], [5, 3]) { my $rx = rxg(@$ar); print $rx; my @groups = $s =~ m{ $rx }xmsg; print qq{'$_'} for @groups; } ;; sub rxg { my ($rx) = map qr{ \b $_ \b }xms, join ' \b .+? \b ', map qq{\\w{$_}}, @_ ; ;; return $rx; } " (?^msx: \b \w{2} \b .+? \b \w{4} \b .+? \b \w{3} \b ) 'yy zzzzz xxxx qqq' 'yy zzzzz xxxx qqq' (?^msx: \b \w{5} \b .+? \b \w{3} \b ) 'zzzzz xxxx qqq' 'zzzzz xxxx qqq'

      Update: No, darn it, that's still not right! nicemank seems to want  'yy xxxx qqq' from  'yy zzzzz xxxx qqq'. Oh, well...

        I think that the sub I wrote below does the trick, but I only did limited testing on it.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1019201]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2015-07-30 10:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (271 votes), past polls