Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

pattern sequence dispersed within text

by nicemank (Novice)
on Oct 01, 2012 at 11:20 UTC ( #996624=perlquestion: print w/ replies, xml ) Need Help??
nicemank has asked for the wisdom of the Perl Monks concerning the following question:

I want to find patterns dispersed within texts. Any word as the search pattern. Any text.

So (here goes):

I split a word into character pairs. Say the name is 'helen' (case irrelevant). That's got 5 letters; so it is two pairs and a single letter: 'he', 'le' and 'n'.

I want to get the parts in sequence. This is the case whether they are conveniently in the correct order in the text, such as:

1. xxxhexxxxxxxx xle xxxxxx nxxx

From this I want: xxxhexxxxxxxx xle nxxx

But they may not be in quite the right order. There may be repetitions and/or parts in the wrong order:

2. xxxhexxxxxxxx xle xxnxle nxxx xxnxxx xnxxhexx nxxxxx xlexxxxxx nxnx xxxx

I'd like to get:

xxxhexxxxxxxx xle xxnxle xnxxhexx xlexxxxxx nxnx

'xxnxle ' appears because it contains the final 'n'. The fact that it contains an additional 'le' just does not matter.

But actually I get:

xxxhexxxxxxxx xle xxnxle xxnxxx xnxxhexx xlexxxxxx nxnx


In other words it should always get the input sequence in the correct order if it is there. It will get it repeatedly if it is there. It will discard if it can anything extraneous.

Taking an input ( for instance, $words = 'xxxhexxxxxxxx xle xxnxle nxxx xxnxxx xnxxhexx nxxxxx xlexxxxxx nxnx xxxx')


my @other_stuff = split (' ', $words); my @pairs = $string =~ /..?/sg; my @stuff = grep /\B$pairs[0]|\B$pairs[1]|\B$pairs[2]/, @other_stuff; # this assumes I know the word length. # but actually I would not know. # so this is not ok and needs to be fixed! for (@stuff) { print $_ . " "; print OUTPUT $_ . " "; } close OUTPUT; exit;
Prints: 'xxxhexxxxxxxx xle xxnxle xxnxxx xnxxhexx xlexxxxxx nxnx' as above, which is wrong.

I realise this is a complicated question. But any help gratefully received.

Comment on pattern sequence dispersed within text
Download Code
Re: pattern sequence dispersed within text
by choroba (Abbot) on Oct 01, 2012 at 11:41 UTC
    This should do what you want:
    #!/usr/bin/perl use warnings; use strict; use feature 'say'; my $word = 'helen'; my $input = 'xxxhexxxxxxxx xle xxnxle nxxx xxnxxx xnxxhexx nxxxxx xlex +xxxxx nxnx xxxx'; my $regex = '([^ \b]*' . join('[^ \b]*).*?([^ \b]*', $word =~ /(..?)/g +) . '[^ \b]*)'; my @matches = $input =~ /$regex/g; say for @matches;
    The regex is constructed from the $word dynamically, so you do not have to know the length of it in advance.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Just in case quotemeta insert  map { "\Q$_\E" }
Re: pattern sequence dispersed within text
by remiah (Hermit) on Oct 01, 2012 at 14:42 UTC

    hello. How about like this ?

    use warnings; use strict; my $words = 'xxxhexxxxxxxx xle xxnxle nxxx xxnxxx xnxxhexx nxxxxx xlex +xxxxx nxnx xxxx'; my $string='helen'; print "string=$string\n"; print "words=$words\n"; my @other_stuff = split (/\s+/, $words); my @pairs = $string =~ /..?/sg; my $target=$pairs[0]; my $idx=0; while ($words =~ /(\b\w*?$target\w*?\b)/g ){ print "target=$target,matched=$1\n"; $idx++; $idx=0 if $idx > $#pairs; $target = $pairs[$idx]; }

Re: pattern sequence dispersed within text
by nicemank (Novice) on Oct 02, 2012 at 08:06 UTC
    Remiah and Choroba both have the answer. True masters of their art. Nicemank's thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://996624]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (18)
As of 2014-08-28 13:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (261 votes), past polls