XP is just a number PerlMonks

Re: how to count the number of repeats in a string (really!)

by oha (Friar)
 on Nov 14, 2007 at 16:34 UTC ( #650792=note: print w/replies, xml ) Need Help??

First of all, i will find the longest matching sequences possibile, which are in the following string xx, abc and ecd.
(I use the zero-width lookahead to avoid to reset pos)
Then I'll break those substrings in parts, if abc is repeated i suspect also bc is repeated, isn't it?
Then I'll count the repetitions of only those substrings:
```\$s = 'xxaabcdabcabcecdecdxx';

\$min_len = 2;
\$min_rep = 2;
while(\$s=~/(.{\$min_len,})(?=.*?\1)/g)
{
for my \$x (0..length \$1) {
@saw{ map {substr \$1, \$x, \$_} \$x+2..length \$1 } = ();
}
pos(\$s) = pos(\$s) +1 -length \$1; # fix
}
for (keys %saw)
{
my \$saw{\$_} =()= \$s=~/\Q\$_/g;
delete \$saw{\$_} if \$saw{\$_}<\$min_rep;
}
____
a 4
ab 3
abc 3
bc 3
c 5
cd 3
d 3
e 2
ec 2
ecd 2
x 4
xx 2
The first loop find the repetitions, the second count them. if you want to get only 2 or more char substring, change \$x+1 to \$x+2.

Oha

Update: added regex quoting to the last re

Update: shorter and print in order of findings:

```while(\$s=~/(\w\w+)(?=.*?\1)/g)
{
foreach \$x (0..length \$1) {
map {
\$y = substr \$1, \$x, \$_;
\$saw{\$y}++ || do {
local pos \$s = 0;
my \$c=()= \$s=~/\Q\$y/g;
print "\$y => \$c\n";
}
} (\$x+1..length \$1)
}
pos(\$s) = pos(\$s) +1 -length \$1; # fix
}

Update: fix a bug in the above code, added a pos() relocation (see #fix)

Replies are listed 'Best First'.
Re^2: how to count the number of repeats in a string (really!)
by blazar (Canon) on Nov 14, 2007 at 21:25 UTC
Then I'll break those substrings in parts, if abc is repeated i suspect also bc is repeated, isn't it?

I personally believe that if it were only a suspect it wouldn't be enough. The nice part is that it's obviously certain it is! Anyway, I like your approach very much. For completeness I'm recasting your code in a sub with a similar behaviour to the ones in previous code:

```sub oha {
my \$s=shift;
my %saw;

while(\$s =~ /(..+)(?=.*?\1)/g) {
for my \$x (0..length \$1) {
@saw{ map {substr \$1, \$x, \$_} \$x+2..length \$1 } = ();
}
}
\$saw{\$_} =()= \$s =~ /\Q\$_/g for keys %saw;
\%saw;
}

Update: I see that you chaged your nodes content and that the original code is not there anymore. I recommend you to only post updates instead, and if you feel that something is wrong and needs to be "deleted", then possibly use <strike> tags. To keep the visual size of your node limited for those that do not want to read all of its details, you can also adopt <readmore> tags.

i added a line marked with # fix as i explained, there is a bug in the code you used to rearrange: you must add the fix pos(\$s)=pos(\$s)+1-length \$1; at the end of the while:;
```sub oha {
my \$s=shift;
my %saw;

while(\$s =~ /(..+)(?=.*?\1)/g) {
for my \$x (0..length \$1) {
@saw{ map {substr \$1, \$x, \$_} \$x+2..length \$1 } = ();
}
pos(\$s)=pos(\$s)+1-length \$1; # fix
}
\$saw{\$_} =()= \$s =~ /\Q\$_/g for keys %saw;
\%saw;
}

i apologize for the bug

Oha

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://650792]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2022-01-22 12:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
In 2022, my preferred method to securely store passwords is:

Results (62 votes). Check out past polls.

Notices?