Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

How can I count characters between two same substrings in a string where the substring is repeated more than 5 times?

by supriyoch_2008 (Scribe)
on Dec 09, 2011 at 07:42 UTC ( #942582=perlquestion: print w/ replies, xml ) Need Help??
supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Sir,

I am a beginner in PERL programming. I shall appreciate if you can help me to solve a basic problem in PERL programming.

I need the PERL program to count the number of characters between any two substrings in a long string and the same substring is repeated several times in the string. For example, the substring "cat" is repeated 5 times in the string $a="batcatratdeercatdogmousepalmcocoacatdoll monkeycatsusandollypollycatpomkelly". I need a perl program to count the number of characters between 1st and 2nd "cat", between 2nd and 3rd "cat" and so on.

I am a beginner in perl. I tried a lot to find such a program to count the characters between two same substrings in a string through internet but could not find. If anyone has the solution, please write to me in the email: supriyoch_2008@rediffmail.com.

Please suggest me which topics in perl would help me solve these problems.

I am looking forward to hear from you soon.

S. Chakraborty

Comment on How can I count characters between two same substrings in a string where the substring is repeated more than 5 times?
Download Code
Re: How can I count characters between two same substrings in a string where the substring is repeated more than 5 times?
by BrowserUk (Pope) on Dec 09, 2011 at 07:48 UTC

    $a="batcatratdeercatdogmousepalmcocoacatdoll monkeycatsusandollypollyc +atpomkelly";; print $-[2] - $+[1] while $a =~ m[(cat)(?=.+?(cat))]g;; 7 17 11 15

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Re: How can I count characters between two same substrings in a string where the substring is repeated more than 5 times?
by tobyink (Abbot) on Dec 09, 2011 at 08:15 UTC

    Easiest way would be to split the string, and use length on each portion:

    use 5.010; my $string = "abcFOOdefghFOOiFOOjklmFOOnopqrFOOstuvFOOwxyz"; say "Lengths are:"; say $_ for map { sprintf("%d ('%s')", length, $_) } split /FOO/, $stri +ng;
      Elegant, but deviates from OP's spec, in that the character-count before the first "FOO" (or "cat") and after the last are reported in the output.

      At risk of being bitten again by my blindspots, I see no way to force split alone (or even with a LIMIT) to do the job by itself. OTOH, adding a simple regex to remove the chars before the first delimiter and after the last would work nicely.

        I'd use split. One of the points of the OP's spec is "repeated more than 5 times". A lot easier to check with split, than the cram it all in a regexp.

        Untested:

        my @chunks = split /PAT/, $str; shift @chunks unless $str =~ /^PAT/; pop @chunks unless $str =~ /PAT$/; if (@chunks > 4) { say "Lengths: @{[map {length} @chunks]}"; }
        Yes, I deviated deliberately for simplicity. If you want to ignore the leading and trailing sections, just shift the first element off the list you get back from split, then pop the last element off as well.
Re: How can I count characters between two same substrings in a string where the substring is repeated more than 5 times?
by sundialsvc4 (Abbot) on Dec 09, 2011 at 14:34 UTC

    The somewhat obscurely documented modifiers (g, I think, and one other ... and the pos function) that allow you to match regular expressions repeatedly in the same loop, do come in handy here.   You can thereby do what split does but with much more control.

    So, you can first determine that the desired substring does occur more than 5 times, then reset the search to start over (pos=undef;) and then start grabbing strings that occur “in front of” the desired delimiter.

    The only thing that really bit me, the last time I was doing this, is that I failed to read perldoc -f pos closely enough..   I overlooked the highlighted part of this, and my code was ignoring the first character:

    Returns the offset of where the last "m//g" search left off for the variable in question ($_ is used when the variable is not specified).   Note that 0 is a valid match offset.   "undef" indicates that the search position is reset.   [...]

    The technique works ... it works very well indeed ... but pay close attention to the edge-cases when testing it.

Re: How can I count characters between two same substrings in a string where the substring is repeated more than 5 times?
by AnomalousMonk (Abbot) on Dec 09, 2011 at 17:25 UTC
Re: How can I count characters between two same substrings in a string where the substring is repeated more than 5 times?
by davido (Archbishop) on Dec 12, 2011 at 17:04 UTC

    People won't email responses to you. They will post them here in the forum for all to see and benefit from. If you want a response to your questions, I suggest you check back here in the forum.

    Don't ask for email follow-ups here.


    Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://942582]
Approved by wfsp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (10)
As of 2014-12-23 00:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (133 votes), past polls