I'd build a trie but have it track how many strings so far have contained each substring. Then I'd track the length of the longest substring so far found in enough strings. When a substring's find count reaches the threshold, if it is the same length as the longest, then I'd add it to the "found" list. If it is longer, then I'd update the max length and replace the "found" list with just that one substring.
Since a single substring could appear multiple times in each string, you also need to track the substrings that you have, so far, entered into the trie for the current string (which I'd just do in a hash).
That method is efficient enough that I ran it against all 172,819 words in my copy of the "enable1" word list, and just had it simultaneously compute the answers for all counts. Producing, in part:
Found 1 longest substrings in at least 120964 different strings:
e
Found 1 more (2 total) in at least 103913 different strings:
s
Found 1 more (3 total) in at least 102028 different strings:
i
Found 1 more (4 total) in at least 93844 different strings:
a
Found 1 more (5 total) in at least 90701 different strings:
r
Found 1 more (6 total) in at least 84198 different strings:
n
Found 1 more (7 total) in at least 83358 different strings:
t
Found 1 more (8 total) in at least 79367 different strings:
o
...
Found 1 more (15 total) in at least 38116 different strings:
g
Found 1 longest substrings in at least 33923 different strings:
es
Found 1 more (2 total) in at least 31096 different strings:
in
Found 1 more (3 total) in at least 30960 different strings:
er
Found 1 more (4 total) in at least 22962 different strings:
ti
...
Found 1 more (14 total) in at least 15277 different strings:
is
Found 1 longest substrings in at least 15241 different strings:
ing
Found 1 more (2 total) in at least 7831 different strings:
ess
Found 1 more (3 total) in at least 7800 different strings:
ion
...
Found 1 more (8 total) in at least 6567 different strings:
tio
Found 1 longest substrings in at least 6330 different strings:
tion
Found 1 more (2 total) in at least 5771 different strings:
ness
Found 1 more (3 total) in at least 4474 different strings:
atio
Found 1 longest substrings in at least 4441 different strings:
ation
Found 1 more (2 total) in at least 3065 different strings:
esses
Found 1 more (3 total) in at least 2883 different strings:
nesse
Found 1 longest substrings in at least 2871 different strings:
nesses
Found 1 more (2 total) in at least 1968 different strings:
ations
Found 1 more (3 total) in at least 1030 different strings:
ically
Found 1 more (4 total) in at least 873 different strings:
zation
Found 1 longest substrings in at least 869 different strings:
ization
Found 1 more (2 total) in at least 724 different strings:
ousness
Found 1 more (3 total) in at least 722 different strings:
enesses
Found 1 more (4 total) in at least 592 different strings:
inesses
Found 1 more (5 total) in at least 570 different strings:
ilities
Found 1 longest substrings in at least 523 different strings:
bilities
Found 1 more (2 total) in at least 431 different strings:
izations
Found 1 longest substrings in at least 393 different strings:
abilities
Found 1 more (2 total) in at least 362 different strings:
ousnesses
Found 1 more (3 total) in at least 270 different strings:
alization
Found 1 more (4 total) in at least 258 different strings:
ification
Found 1 more (5 total) in at least 219 different strings:
ivenesses
Found 1 more (6 total) in at least 214 different strings:
blenesses
Found 1 more (7 total) in at least 181 different strings:
lizations
Found 1 longest substrings in at least 169 different strings:
ablenesses
Found 1 more (2 total) in at least 162 different strings:
iousnesses
Found 1 more (3 total) in at least 139 different strings:
lessnesses
Found 1 more (4 total) in at least 135 different strings:
alizations
Found 1 more (5 total) in at least 130 different strings:
tivenesses
Found 1 more (6 total) in at least 126 different strings:
ifications
Found 1 more (7 total) in at least 86 different strings:
tabilities
Found 2 more (9 total) in at least 82 different strings:
sivenesses
ologically
Found 1 more (10 total) in at least 80 different strings:
ographical
Found 1 more (11 total) in at least 67 different strings:
tification
Found 2 more (13 total) in at least 60 different strings:
nstitution
nalization
Found 1 more (14 total) in at least 55 different strings:
rabilities
Found 3 more (17 total) in at least 54 different strings:
stitutiona
rousnesses
raphically
Found 2 longest substrings in at least 53 different strings:
stitutional
graphically
Found 1 longest substrings in at least 51 different strings:
nstitutional
Found 1 more (2 total) in at least 48 different strings:
ographically
Found 1 longest substrings in at least 41 different strings:
nstitutionali
Found 1 more (2 total) in at least 31 different strings:
institutional
Found 1 longest substrings in at least 27 different strings:
institutionali
Found 1 more (2 total) in at least 25 different strings:
nstitutionaliz
Found 1 more (3 total) in at least 20 different strings:
constitutional
Found 1 longest substrings in at least 19 different strings:
institutionaliz
Found 1 more (2 total) in at least 14 different strings:
constitutionali
Found 1 more (3 total) in at least 13 different strings:
nstitutionalize
Found 1 longest substrings in at least 12 different strings:
einstitutionaliz
Found 1 longest substrings in at least 10 different strings:
electroencephalogra
Found 1 longest substrings in at least 8 different strings:
electroencephalograph
Found 1 more (2 total) in at least 4 different strings:
einstitutionalization
Found 1 longest substrings in at least 3 different strings:
electroencephalographi
Found 1 longest substrings in at least 2 different strings:
ethylenediaminetetraacetate
Note that I was lazy and had it just track (eventually) all 120,964 "answers" and whittled it down to the 91 different answers as the last step. It'd be significantly even more efficient (in both space and time) if I had only tracked the unique answers as it went along. But that probably would have taken more of my time in programming than just waiting a few minutes for the above results took. The script as-is took very little time to write.
No, I didn't include the code. This looks like a homework problem so I won't be posting the code for at least a couple of weeks.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|