Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Longest possible run of a single character

by srdst13 (Pilgrim)
on May 22, 2006 at 20:55 UTC ( #551038=perlquestion: print w/ replies, xml ) Need Help??
srdst13 has asked for the wisdom of the Perl Monks concerning the following question:

This is a very simple problem to which there appears to be more than one answer, potentially. I am looking for the longest substring composed of a single repeating character. I can find all substrings having a repeated character longer than size "n" using a regex like:

my $re = '(.)\1' . "{$n,})"

but I can't seem to figure out how to find the longest stretch. My brute-force solution would be to loop over reasonable values of "n" and check them all. This is probably adequate for my needs where 1<n<20 or so. I'm just curious what other solutions folks might have, given that I am looking for repeats of a single character.

Thanks,
Sean

Comment on Longest possible run of a single character
Download Code
Re: Longest possible run of a single character
by Zaxo (Archbishop) on May 22, 2006 at 21:26 UTC

    It's simpler than that, just use regex greediness. You were very close.

    my $re = qr/((.)\2+)/;
    Here's an example,
    $ perl -e'my $re = qr/((.)\2+)/; $_="aabccccdddeffff"; while (m/$re/g) + { printf "\"%s\" x %d\n", $2, length($1) }' "a" x 2 "c" x 4 "d" x 3 "f" x 4 $
    That skips capturing lone characters as a sequence of one - change the '+' quantifier to '*' to get them, too. There is no practical limit on the length of the match.

    I didn't address picking out the maximum length substring captured. There are lots of ways to do that.

    Update: Ok, here's an easy way to get the max length as the search is done, using the (?{}) construct.

    use re 'eval'; my $re = qr/((.)\2+)/; my ($maxlen, $maxchr, $maxloc); $_="aabccccdddeffff"; 1 while m/$re(?{ $maxlen = length($^N), $maxchr = substr($^N,0,1), $maxloc = pos() - $maxlen if length($^N) > $maxlen; })/g; print $maxchr, ' x ', $maxlen, ' at ', $maxloc, $/; __END__ c x 4 at 3
    Access to the original matching chunk of the string is given by substr($_, $maxloc, $maxlen).

    After Compline,
    Zaxo

      PERFECT! I knew there was a better mousetrap!
      A slight improvement. It is better to set the pattern to $re = qr/((.)(?:\2)*)/ . This will allow matching strings with single chars, strings like $_ = 'abc' .
Re: Longest possible run of a single character
by GrandFather (Sage) on May 22, 2006 at 21:28 UTC

    Are you looking for the longest run of a specific single character (longeset run of the letter 'a' for example), or the longest run of any character?


    DWIM is Perl's answer to Gödel
Re: Longest possible run of a single character
by BrowserUk (Pope) on May 22, 2006 at 21:41 UTC

    Sorting isn't a particularly efficient way of finding the maximum length, but unless your strings are huge, it probably won't matter too much.

    print +( sort{ length $b <=> length $a } $s =~ m[((.)\2+)]g )[ 0 ];;

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Longest possible run of a single character
by GrandFather (Sage) on May 22, 2006 at 22:47 UTC

    If you are dealing with large strings then the following may be a little quicker than the regex based techniques posted so far:

    use warnings; use strict; my $str; my $len = 0; while ($len < 1000000) { my $runLen = int rand (50); $str .= chr (ord ('a') + int rand (26)) x $runLen; $len += $runLen; } my $sstr = substr ($str, 0, 1) . (substr ($str, 1) ^ $str); my @bestRuns; my $match = "\0"; my $bestRunLen = 2; my $scan = 0; while (-1 != (my $start = index $sstr, $match, $scan)) { my $runLen = length ((substr ($sstr, $start) =~ /(\0+)/)[0]) + 1; if ($runLen > $bestRunLen) { # new best match @bestRuns = $start - 1; $bestRunLen = $runLen; $match = "\0" x ($bestRunLen - 1); } else { # another best match push @bestRuns, $start - 1; } $scan = $start + $bestRunLen - 1; } for (@bestRuns) { print "Run of " . substr ($str, $_, 1) . " from $_ for $bestRunLen +\n"; }

    Prints:

    Run of r from 766269 for 144

    Note that this finds all the matches and their start indexes.


    DWIM is Perl's answer to Gödel
Re: Longest possible run of a single character
by TedPride (Priest) on May 23, 2006 at 05:31 UTC
    A linear solution, which will work fast for any size string:
    use strict; use warnings; my ($c, $maxn, $n, $maxc, $str) = ('', 0); $str = join '', <DATA>; for (0..(length($str)-1)) { $_ = substr($str, $_, 1); if ($_ ne $c) { $n = 1; $c = $_; } else { $n++; if ($n > $maxn) { $maxn = $n; $maxc = $c; } } } print $maxc x $maxn; __DATA__ ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABACC

      Never under estimate the performance of the regex engine.

      #! perl -slw use strict; use List::Util qw[ reduce ]; use Benchmark qw[ cmpthese ]; our $str; sub TedP { my ($c, $maxn, $n, $maxc) = ('', 0); for (0..(length($str)-1)) { $_ = substr($str, $_, 1); if ($_ ne $c) { $n = 1; $c = $_; } else { $n++; if ($n > $maxn) { $maxn = $n; $maxc = $c; } } } return $maxc x $maxn; } sub regex { return reduce{ length $a > length $b ? $a : $b } $str =~ m[((.)\2+)]g; } for my $n ( 1 .. 6 ) { $str = join'', map{chr(65+rand(26)) x int(rand 20) } 1 .. 10**$n; print "\nString length ", length $str; # print regex(); # print TedP(); cmpthese -1, { TedP=> \&TedP, Regex=> \&regex }; } __END__ C:\test>551038 String length 89 Rate TedP Regex TedP 12429/s -- -50% Regex 24837/s 100% -- String length 939 Rate TedP Regex TedP 1265/s -- -51% Regex 2598/s 105% -- String length 9741 Rate TedP Regex TedP 126/s -- -48% Regex 242/s 92% -- String length 94791 Rate TedP Regex TedP 12.6/s -- -43% Regex 22.3/s 77% -- String length 949396 (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate TedP Regex TedP 1.29/s -- -40% Regex 2.16/s 67% -- String length 9496562 (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter TedP Regex TedP 7.72 -- -39% Regex 4.74 63% --

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Longest possible run of a single character
by GrandFather (Sage) on May 23, 2006 at 10:36 UTC

    Time to haul out cmpthese:

    Results (using various values for the run length generator):

    Index: Run from 701117 for 30 Linear: Run from 701117 for 30 RegexSort: Run from -1 for 30 (warning: too few iterations for a reliable count) s/iter RegexSort Linear Index RegexSort 2.21 -- -54% -97% Linear 1.03 116% -- -94% Index 6.16e-002 3494% 1564% -- Index: Run from 670331 for 125 Linear: Run from 670331 for 125 RegexSort: Run from -1 for 125 Rate Linear RegexSort Index Linear 1.06/s -- -51% -90% RegexSort 2.14/s 102% -- -79% Index 10.2/s 865% 377% -- Index: Run from 749633 for 459 Linear: Run from 749633 for 459 RegexSort: Run from -1 for 459 Rate Linear RegexSort Index Linear 1.05/s -- -77% -82% RegexSort 4.56/s 334% -- -21% Index 5.77/s 450% 27% --

    Note that the first three lines of each group are the check results. RegexSort doesn't generate a start index for the match so -1 is shown. However the same length is generated in each case so it is presumed that the same longest match is being found.


    DWIM is Perl's answer to Gödel

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://551038]
Approved by Joost
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (9)
As of 2015-07-03 18:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (55 votes), past polls