http://www.perlmonks.org?node_id=996624

nicemank has asked for the wisdom of the Perl Monks concerning the following question:

I want to find patterns dispersed within texts. Any word as the search pattern. Any text.

So (here goes):

I split a word into character pairs. Say the name is 'helen' (case irrelevant). That's got 5 letters; so it is two pairs and a single letter: 'he', 'le' and 'n'.

I want to get the parts in sequence. This is the case whether they are conveniently in the correct order in the text, such as:

1. xxxhexxxxxxxx xle xxxxxx nxxx

From this I want: xxxhexxxxxxxx xle nxxx

But they may not be in quite the right order. There may be repetitions and/or parts in the wrong order:

2. xxxhexxxxxxxx xle xxnxle nxxx xxnxxx xnxxhexx nxxxxx xlexxxxxx nxnx xxxx

I'd like to get:

xxxhexxxxxxxx xle xxnxle xnxxhexx xlexxxxxx nxnx

'xxnxle ' appears because it contains the final 'n'. The fact that it contains an additional 'le' just does not matter.

But actually I get:

xxxhexxxxxxxx xle xxnxle xxnxxx xnxxhexx xlexxxxxx nxnx


In other words it should always get the input sequence in the correct order if it is there. It will get it repeatedly if it is there. It will discard if it can anything extraneous.

Taking an input ( for instance, $words = 'xxxhexxxxxxxx xle xxnxle nxxx xxnxxx xnxxhexx nxxxxx xlexxxxxx nxnx xxxx')


my @other_stuff = split (' ', $words); my @pairs = $string =~ /..?/sg; my @stuff = grep /\B$pairs[0]|\B$pairs[1]|\B$pairs[2]/, @other_stuff; # this assumes I know the word length. # but actually I would not know. # so this is not ok and needs to be fixed! for (@stuff) { print $_ . " "; print OUTPUT $_ . " "; } close OUTPUT; exit;
Prints: 'xxxhexxxxxxxx xle xxnxle xxnxxx xnxxhexx xlexxxxxx nxnx' as above, which is wrong.

I realise this is a complicated question. But any help gratefully received.