Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Most common substring

by dga (Hermit)
on Jul 28, 2003 at 19:10 UTC ( #278565=note: print w/replies, xml ) Need Help??


in reply to Most common substring

This could be done in one pass through the string.

Conceptually it would work like so.

use strict; my @nums=split('', $number); # the number to work on my(@most_common, %once, %common); my $mcc=2; # if there are no common substrings don't store while( @nums > 4) { my $key = join('', @nums[0..4]); $common{$key}++; if($common{$key} > $mcc) { $mcc=$common{$key}; @most_common=($key); #new max set entire array to $key } elsif($common{$key} == $mcc) { push(@most_common, $key); # tack $key onto largest } if($common{$key} == 1) { $once{$key}=1; } elsif($once{$key}) { delete $once{$key}; } shift @nums; #slide down 1 digit } print "Most ($mcc): ", join(', ', @most_common), "\n"; print "Once: ", join(', ', (keys %once)), "\n";

I have added a bit of code to track the most common values and save them in an array. I would expect that least common will most likely be the set of substrings only used once. Unless you define common to be lowest but more than once. Then much more bookkeeping code would be needed for the low end. On the high end if the current case is larger than the previous largest we simply forget about the list and start a new list with the current substring as it's only member. I also started $mcc at 2 to avoid a lot of needless bookkeeping for substrings seen only once.

I ran the code on sample data of 2^9999 and got

Most (2): 96655, 84403, 66114, 11748, 17484, 40380, 74169, 41696, 41844, 47194, 71162, 92065, 54736, 28703, 84689, 22165, 92292, 47369, 41891, 87379, 37954, 04224, 42244, 08257, 35778, 23461, 29741, 19795, 79549, 78117, 56688, 58090, 43252, 32528, 42018, 98726, 03714, 41492, 24440, 01363, 40657, 90170, 41347, 48935, 89357

Once: every other substring which is a long list.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://278565]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2021-06-23 18:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (121 votes). Check out past polls.

    Notices?