Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: how to count the number of repeats in a string (really!)

by oha (Friar)
on Nov 14, 2007 at 16:34 UTC ( #650792=note: print w/replies, xml ) Need Help??


in reply to how to count the number of repeats in a string (really!)

First of all, i will find the longest matching sequences possibile, which are in the following string xx, abc and ecd.
(I use the zero-width lookahead to avoid to reset pos)
Then I'll break those substrings in parts, if abc is repeated i suspect also bc is repeated, isn't it?
Then I'll count the repetitions of only those substrings:
$s = 'xxaabcdabcabcecdecdxx'; $min_len = 2; $min_rep = 2; while($s=~/(.{$min_len,})(?=.*?\1)/g) { for my $x (0..length $1) { @saw{ map {substr $1, $x, $_} $x+2..length $1 } = (); } pos($s) = pos($s) +1 -length $1; # fix } for (keys %saw) { my $saw{$_} =()= $s=~/\Q$_/g; delete $saw{$_} if $saw{$_}<$min_rep; } ____ a 4 ab 3 abc 3 bc 3 c 5 cd 3 d 3 e 2 ec 2 ecd 2 x 4 xx 2
The first loop find the repetitions, the second count them. if you want to get only 2 or more char substring, change $x+1 to $x+2.

Oha

Update: added regex quoting to the last re

Update: shorter and print in order of findings:

while($s=~/(\w\w+)(?=.*?\1)/g) { foreach $x (0..length $1) { map { $y = substr $1, $x, $_; $saw{$y}++ || do { local pos $s = 0; my $c=()= $s=~/\Q$y/g; print "$y => $c\n"; } } ($x+1..length $1) } pos($s) = pos($s) +1 -length $1; # fix }

Update: fix a bug in the above code, added a pos() relocation (see #fix)

Replies are listed 'Best First'.
Re^2: how to count the number of repeats in a string (really!)
by blazar (Canon) on Nov 14, 2007 at 21:25 UTC
    Then I'll break those substrings in parts, if abc is repeated i suspect also bc is repeated, isn't it?

    I personally believe that if it were only a suspect it wouldn't be enough. The nice part is that it's obviously certain it is! Anyway, I like your approach very much. For completeness I'm recasting your code in a sub with a similar behaviour to the ones in previous code:

    sub oha { my $s=shift; my %saw; while($s =~ /(..+)(?=.*?\1)/g) { for my $x (0..length $1) { @saw{ map {substr $1, $x, $_} $x+2..length $1 } = (); } } $saw{$_} =()= $s =~ /\Q$_/g for keys %saw; \%saw; }

    Update: I see that you chaged your nodes content and that the original code is not there anymore. I recommend you to only post updates instead, and if you feel that something is wrong and needs to be "deleted", then possibly use <strike> tags. To keep the visual size of your node limited for those that do not want to read all of its details, you can also adopt <readmore> tags.

      i added a line marked with # fix as i explained, there is a bug in the code you used to rearrange: you must add the fix pos($s)=pos($s)+1-length $1; at the end of the while:;
      sub oha { my $s=shift; my %saw; while($s =~ /(..+)(?=.*?\1)/g) { for my $x (0..length $1) { @saw{ map {substr $1, $x, $_} $x+2..length $1 } = (); } pos($s)=pos($s)+1-length $1; # fix } $saw{$_} =()= $s =~ /\Q$_/g for keys %saw; \%saw; }

      i apologize for the bug

      Oha

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://650792]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2022-01-22 12:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (62 votes). Check out past polls.

    Notices?