Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Longest possible run of a single character

by srdst13 (Pilgrim)
on May 22, 2006 at 20:55 UTC ( #551038=perlquestion: print w/ replies, xml ) Need Help??
srdst13 has asked for the wisdom of the Perl Monks concerning the following question:

This is a very simple problem to which there appears to be more than one answer, potentially. I am looking for the longest substring composed of a single repeating character. I can find all substrings having a repeated character longer than size "n" using a regex like:

my $re = '(.)\1' . "{$n,})"

but I can't seem to figure out how to find the longest stretch. My brute-force solution would be to loop over reasonable values of "n" and check them all. This is probably adequate for my needs where 1<n<20 or so. I'm just curious what other solutions folks might have, given that I am looking for repeats of a single character.

Thanks,
Sean

Comment on Longest possible run of a single character
Download Code
Re: Longest possible run of a single character
by Zaxo (Archbishop) on May 22, 2006 at 21:26 UTC

    It's simpler than that, just use regex greediness. You were very close.

    my $re = qr/((.)\2+)/;
    Here's an example,
    $ perl -e'my $re = qr/((.)\2+)/; $_="aabccccdddeffff"; while (m/$re/g) + { printf "\"%s\" x %d\n", $2, length($1) }' "a" x 2 "c" x 4 "d" x 3 "f" x 4 $
    That skips capturing lone characters as a sequence of one - change the '+' quantifier to '*' to get them, too. There is no practical limit on the length of the match.

    I didn't address picking out the maximum length substring captured. There are lots of ways to do that.

    Update: Ok, here's an easy way to get the max length as the search is done, using the (?{}) construct.

    use re 'eval'; my $re = qr/((.)\2+)/; my ($maxlen, $maxchr, $maxloc); $_="aabccccdddeffff"; 1 while m/$re(?{ $maxlen = length($^N), $maxchr = substr($^N,0,1), $maxloc = pos() - $maxlen if length($^N) > $maxlen; })/g; print $maxchr, ' x ', $maxlen, ' at ', $maxloc, $/; __END__ c x 4 at 3
    Access to the original matching chunk of the string is given by substr($_, $maxloc, $maxlen).

    After Compline,
    Zaxo

      PERFECT! I knew there was a better mousetrap!
      A slight improvement. It is better to set the pattern to $re = qr/((.)(?:\2)*)/ . This will allow matching strings with single chars, strings like $_ = 'abc' .
Re: Longest possible run of a single character
by GrandFather (Cardinal) on May 22, 2006 at 21:28 UTC

    Are you looking for the longest run of a specific single character (longeset run of the letter 'a' for example), or the longest run of any character?


    DWIM is Perl's answer to Gödel
Re: Longest possible run of a single character
by BrowserUk (Pope) on May 22, 2006 at 21:41 UTC

    Sorting isn't a particularly efficient way of finding the maximum length, but unless your strings are huge, it probably won't matter too much.

    print +( sort{ length $b <=> length $a } $s =~ m[((.)\2+)]g )[ 0 ];;

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Longest possible run of a single character
by GrandFather (Cardinal) on May 22, 2006 at 22:47 UTC

    If you are dealing with large strings then the following may be a little quicker than the regex based techniques posted so far:

    use warnings; use strict; my $str; my $len = 0; while ($len < 1000000) { my $runLen = int rand (50); $str .= chr (ord ('a') + int rand (26)) x $runLen; $len += $runLen; } my $sstr = substr ($str, 0, 1) . (substr ($str, 1) ^ $str); my @bestRuns; my $match = "\0"; my $bestRunLen = 2; my $scan = 0; while (-1 != (my $start = index $sstr, $match, $scan)) { my $runLen = length ((substr ($sstr, $start) =~ /(\0+)/)[0]) + 1; if ($runLen > $bestRunLen) { # new best match @bestRuns = $start - 1; $bestRunLen = $runLen; $match = "\0" x ($bestRunLen - 1); } else { # another best match push @bestRuns, $start - 1; } $scan = $start + $bestRunLen - 1; } for (@bestRuns) { print "Run of " . substr ($str, $_, 1) . " from $_ for $bestRunLen +\n"; }

    Prints:

    Run of r from 766269 for 144

    Note that this finds all the matches and their start indexes.


    DWIM is Perl's answer to Gödel
Re: Longest possible run of a single character
by TedPride (Priest) on May 23, 2006 at 05:31 UTC
    A linear solution, which will work fast for any size string:
    use strict; use warnings; my ($c, $maxn, $n, $maxc, $str) = ('', 0); $str = join '', <DATA>; for (0..(length($str)-1)) { $_ = substr($str, $_, 1); if ($_ ne $c) { $n = 1; $c = $_; } else { $n++; if ($n > $maxn) { $maxn = $n; $maxc = $c; } } } print $maxc x $maxn; __DATA__ ABDBACCBBBCBDDBCDCBCCDBABCBABBBBBADACDABACC

      Never under estimate the performance of the regex engine.

      #! perl -slw use strict; use List::Util qw[ reduce ]; use Benchmark qw[ cmpthese ]; our $str; sub TedP { my ($c, $maxn, $n, $maxc) = ('', 0); for (0..(length($str)-1)) { $_ = substr($str, $_, 1); if ($_ ne $c) { $n = 1; $c = $_; } else { $n++; if ($n > $maxn) { $maxn = $n; $maxc = $c; } } } return $maxc x $maxn; } sub regex { return reduce{ length $a > length $b ? $a : $b } $str =~ m[((.)\2+)]g; } for my $n ( 1 .. 6 ) { $str = join'', map{chr(65+rand(26)) x int(rand 20) } 1 .. 10**$n; print "\nString length ", length $str; # print regex(); # print TedP(); cmpthese -1, { TedP=> \&TedP, Regex=> \&regex }; } __END__ C:\test>551038 String length 89 Rate TedP Regex TedP 12429/s -- -50% Regex 24837/s 100% -- String length 939 Rate TedP Regex TedP 1265/s -- -51% Regex 2598/s 105% -- String length 9741 Rate TedP Regex TedP 126/s -- -48% Regex 242/s 92% -- String length 94791 Rate TedP Regex TedP 12.6/s -- -43% Regex 22.3/s 77% -- String length 949396 (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate TedP Regex TedP 1.29/s -- -40% Regex 2.16/s 67% -- String length 9496562 (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter TedP Regex TedP 7.72 -- -39% Regex 4.74 63% --

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Longest possible run of a single character
by GrandFather (Cardinal) on May 23, 2006 at 10:36 UTC

    Time to haul out cmpthese:

    Results (using various values for the run length generator):

    Index: Run from 701117 for 30 Linear: Run from 701117 for 30 RegexSort: Run from -1 for 30 (warning: too few iterations for a reliable count) s/iter RegexSort Linear Index RegexSort 2.21 -- -54% -97% Linear 1.03 116% -- -94% Index 6.16e-002 3494% 1564% -- Index: Run from 670331 for 125 Linear: Run from 670331 for 125 RegexSort: Run from -1 for 125 Rate Linear RegexSort Index Linear 1.06/s -- -51% -90% RegexSort 2.14/s 102% -- -79% Index 10.2/s 865% 377% -- Index: Run from 749633 for 459 Linear: Run from 749633 for 459 RegexSort: Run from -1 for 459 Rate Linear RegexSort Index Linear 1.05/s -- -77% -82% RegexSort 4.56/s 334% -- -21% Index 5.77/s 450% 27% --

    Note that the first three lines of each group are the check results. RegexSort doesn't generate a start index for the match so -1 is shown. However the same length is generated in each case so it is presumed that the same longest match is being found.


    DWIM is Perl's answer to Gödel

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://551038]
Approved by Joost
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (11)
As of 2014-07-13 17:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (251 votes), past polls