more useful options PerlMonks

### How many words does it take?

by Limbic~Region (Chancellor)
 on Oct 23, 2006 at 16:27 UTC ( #580093=perlmeditation: print w/ replies, xml ) Need Help??

All,
This meditation is about not giving up on a problem just because it is NP complete. Common responses from seasoned programmers to NP complete questions include:
• Don't bother, it is NP complete
• Find a heurstic approximation and hope it is good enough
• Brute force is the only way to go
Response 2 is actually pretty good advice when it applies. Response 3 may be true but not all brute force approaches are created equal. The fact is that real world problems need to be solved regardless of CS theory. Just because something is NP complete, doesn't mean there isn't a smart(er|ish)? way of solving it.

I decided to work with challenging the dictionary. I was familiar with it because I had already provided a fairly fast approximation algorithm. The problem stated is simple: Given a dictionary file, find the minimal number of words that contain a given set of letters.

I set out a list of constraints which turned the general problem into a specific one:
• The dictionary file = 2of12 from the Official 12Dicts Package
• Letters of both question and answer = a-z
• Answer need only be fewest words, not fewest letters
• Question may not contain duplicate letters - abc ok, abbc not ok

Originally, I wanted to precalculate enough information to allow each question to be solveable in polynomial time. While I still believe this approach would work with some problems, I ended up precalculating all solutions and functionally turned it into a lookup table. To do this, I needed to reduce the problem space enough to make a brute force solution feasible. Here are the guiding optimization principals I employed:

#### Not all solutions need to be found

There are a finite number of questions that can be asked because the letter set has been restricted to a-z. Using the combinatorial formula C = N! / K!(N - K)! when N = 26 and K = 1..26, there are 67_108_863 different possible questions. Since the solution that applies to 'abc' also applies to 'ab', 'ac', 'bc', 'a', 'b', and 'c' - even fewer questions actually have to be answered.

#### Not all words need be considered

Because of the constraints, we can reduce a word to the unique list of letters in contains. The word 'screwdriver' becomes 'cdeirsvw' and the word 'disc' becomes 'cdis'. Because there are no letters in 'disc' that are not contained in 'screwdriver', 'disc' need not be considered. Applying this principal further, we end up with only words of the longest string of unique characters.

Instead of solving questions at random, a BFS will provide the most efficient solution path. Start by finding all questions that can be solved using a single word from our reduced dictionary, then 2 at a time, then 3 at a time, etc until the longest possible question (a-z) can be answered.

#### Not all combinations of words need be considered

Since our goal is to get to a-z, when combining words we need only consider words that contain letters we don't already have. Further, we can use the reduction technique to only include the longest list of "new" characters. As an example if we start with 'hello' and are considering combining it with 'windowpane' and 'weapon', we are really considering 'adinpw' and 'anpw' and we can safely ignore 'weapon'.

#### Not all derived answers need be considered

Since we know that the answer for 'abc' also answers 'ab', 'ac', 'bc', 'a', 'b', and 'c' - we only need solve for 'abc' and we get the powerset for free. We can take advantage of this when solving new questions because we can skip the portions of the powerset that have been solved by some previous question. For instance, if we are looking at 'abcd', 'abce', and 'abcfg' - we know we can skip 'abc' and their descendants when solving the second and third question.

Even with these guiding principals in mind before I started, I still followed a number of dead-ends before finding the right approach. I will attempt to highlight the mistakes I made but will only be showing code for the actual solution.

#### Phase 1: Reduce the dictionary

Number of lines in input: 61_406
Number of lines in output: 3_477
Execution time: 1 min 15 seconds

```#!/usr/bin/perl
use strict;
use warnings;
use constant  WORD => 0;
use constant   LEN => 1;
use constant NFORM => 2;
use Inline C =>;

my \$file = \$ARGV[0] || 'dictionary.txt';
open(my \$fh, '<', \$file) or die "Unable to open '\$file' for reading: \$
+!";

my @word;
while (<\$fh>) {
\$_ = lc;
tr/a-z//cd;
my %uniq = map {\$_ => undef} split //;
my \$len = keys %uniq;
push @word, [\$_, \$len, join '', sort keys %uniq];
}

@word = sort {\$b->[LEN] <=> \$a->[LEN] } @word;

for my \$i (0 .. \$#word - 1) {
next if ! defined \$word[\$i];
for my \$j (\$i + 1 .. \$#word) {
next if ! defined \$word[\$j];
\$word[\$j] = undef if ! distinct(\$word[\$i][NFORM], \$word[\$j][NF
+ORM]);
}
}
for (grep defined, @word) {
print join "\t", \$_->[NFORM], \$_->[WORD];
print "\n";
}
__END__
__C__

int distinct(unsigned char *str1, unsigned char *str2) {
/* Actual code has 256 0s - truncated for post */
char exists[256] = {};

/* Turn array into a hash */
while (*str1) {
exists[*str1++] = 1;
}

/* Determine if str2 contains any chars str1 does not */
while (*str2) {
if (! exists[*str2++]) return 1;
}
return 0;
}

#### Phase 2: Two at a time

Number of lines in input: 3_477
Number of lines in output: 636_186
Execution time: 9 min 35 seconds

```#!/usr/bin/perl
use strict;
use warnings;
use constant NEW_LET => 1;
use constant     LEN => 2;

use Inline C =>;

my \$file = \$ARGV[0] || 'phase1.data';
open(my \$fh, '<', \$file) or die "Unable to open '\$file' for reading: \$
+!";

my @word;
while (<\$fh>) {
chomp;
push @word, [split /\t/];
}

my %seen_nform;
for my \$i (0 .. \$#word - 1) {
my (\$nform1, \$str1) = @{\$word[\$i]};
my (@new_word, %seen);
for my \$j (\$i + 1 .. \$#word) {
my (\$nform2, \$str2) = @{\$word[\$j]};
my \$new_let = diff(\$nform2, \$nform1);
next if ! \$new_let || \$seen{\$new_let}++;
push @new_word, [\$str2, \$new_let, length(\$new_let)];
}
@new_word = sort { \$b->[LEN] <=> \$a->[LEN] } @new_word;
for my \$i2 (0 .. \$#new_word - 1) {
next if ! defined \$new_word[\$i2];
for my \$j2 (\$i2 + 1 .. \$#new_word) {
next if ! defined \$new_word[\$j2];
\$new_word[\$j2] = undef if ! distinct(\$new_word[\$i2][NEW_LE
+T], \$new_word[\$j2][NEW_LET]);
}
}
for (grep defined, @new_word) {
my \$str2 = \$_->[0];
my %uniq = map {\$_ => undef} split //, \$nform1 . \$str2;
my \$new_nform = join '', sort keys %uniq;
next if \$seen_nform{\$new_nform}++;
print join "\t", \$new_nform, \$str1, \$str2;
print "\n";
}
}
__END__
__C__

SV* diff ( char *str1, char *str2 ) {
SV *sv= newSVpvn( "", 0 );
int result_index= 0;
char *result= SvGROW( sv, 257);

/* identify all chars present in str2 */
while ( *str1 && *str2 ) {
if ( *str1 < *str2)
result[ result_index++ ]= *str1++;
while ( *str1 && *str1 == *str2) {
str1++; str2++;
}
if ( *str1 > *str2 )
str2++;
}
while (*str1)
result[ result_index++ ]= *str1++;

result[ result_index ]= 0;
SvCUR_set( sv, result_index );

return sv;
}

int distinct(unsigned char *str1, unsigned char *str2) {
/* Actual code has 256 0s - truncated for post */
char exists[256] = {};
/* Turn array into a hash */
while (*str1) {
exists[*str1++] = 1;
}

/* Determine if str2 contains any chars str1 does not */
while (*str2) {
if (! exists[*str2++]) return 1;
}
return 0;
}

#### Phase 3: Three at a time

Number of lines in input: 636_186
Number of lines in output: 8_809_183
Execution time: 401 min 10 seconds

Yes, that's java code you see. I originally wrote it in Perl which looked very much like phase 2 but it took between 6 1/3 hours to 11 1/2 hours depending on which machine and what variation of the algorithm I used. If anyone is interested in the perl counterpart - please let me know.

```import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.BitSet;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Scanner;

public class Phase3 {
public static final String regex = "[a-z]+";

public static char alphabet[] = {
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm
+',
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z
+',
};

public static BitSet[] seen_nform = {
new BitSet(),        new BitSet(26),       new BitSet(325),
+  new BitSet(2600),
new BitSet(14950),   new BitSet(65780),    new BitSet(230230),
+  new BitSet(657800),
new BitSet(1562275), new BitSet(3124550),  new BitSet(5311735)
+, new BitSet(7726160),
new BitSet(9657700), new BitSet(10400600), new BitSet(9657700)
+, new BitSet(7726160),
new BitSet(5311735), new BitSet(3124550),  new BitSet(1562275)
+, new BitSet(657800),
new BitSet(230230),  new BitSet(65780),    new BitSet(14950),
+  new BitSet(2600),
new BitSet(325),     new BitSet(26),       new BitSet(1)
};

public static HashMap<Character, Integer> lookup = new HashMap<Cha
+racter, Integer>(26);

public static void main(String[] args) {
initlookup();
Scanner scr1, scr2;
String  line;
+ String>(3477);

try {
scr1 = new Scanner(new File(args[0]));
while (scr1.hasNextLine()) {
line  = scr1.nextLine();
scr2  = new Scanner(line);
word.put(scr2.next(regex), scr2.next(regex));
}
}
catch (IOException ioe) {
ioe.printStackTrace();
}

// Process phase2.data
String nform, str1, str2;
try {
scr1 = new Scanner(new File(args[1]));
while (scr1.hasNextLine()) {
line  = scr1.nextLine();
scr2  = new Scanner(line);
nform = scr2.next(regex);
str1  = scr2.next(regex);
str2  = scr2.next(regex);
HashSet<String> seen = new HashSet<String>();
ArrayList<String[]> new_word = new ArrayList<String[]>
+();

for (Map.Entry<String, String> entry : word.entrySet()
+) {
String new_nform = uniq(nform, entry.getValue());
int len = new_nform.length();
int bit = getBit(len, new_nform);
if (seen_nform[len].get(bit)) { continue; }
String new_let = diff(entry.getKey(), nform);
if (new_let.length() == 0 || seen.contains(new_let
+)) { continue; }
String[] e = {entry.getValue(), new_let, new_nform
+, Integer.toString(bit)};
}
String[] new1, new2;
int idx;
int end = new_word.size();
for (int i = 0; i < end - 1; ++i) {
new1 = new_word.get(i);
if (new1 == null) { continue; }
for (int j = i + 1; j < end; ++j) {
new2 = new_word.get(j);
if (new2 == null) { continue; }
idx = j;
if (new2[1].length() > new1[1].length()) {
String[] temp = new1;
new1 = new2;
new2 = temp;
idx  = i;
}
int res = distinct(new1[1], new2[1]);
if (res == 0) {
new_word.set(idx, null);
}
}
}
for (String[] e : new_word) {
if (e == null) { continue; }
String str3 = e[0];
String new_nform = e[2];
int len = new_nform.length();
int bit = Integer.decode(e[3]);
/* Probably not needed */
if (seen_nform[len].get(bit)) { continue; }
seen_nform[len].set(bit);
System.out.println(new_nform + "\t" + str1 + "\t"
++ str2 + "\t" + str3);
}
}
}
catch (IOException ioe) {
ioe.printStackTrace();
}
}

private static void initlookup () {
// Actual code goes a-z, reduced for post
lookup.put('a', new Integer("1"));
lookup.put('b', new Integer("2"));
}

// Determine Chars in str1 not present in str2
private static String diff(String str1, String str2) {
int len = str2.length();
HashSet<Character> have = new HashSet<Character>(len);
for (Character c : str2.toCharArray()) {
}
StringBuilder new_let = new StringBuilder(len);
for (Character c : str1.toCharArray()) {
if (have.contains(c)) {
continue;
}
new_let.append(c);
}
return new_let.toString();
}

// Determine if str2 (shorter) contains any chars not in str1 (lon
+ger)
private static int distinct(String str1, String str2) {
HashSet<Character> have = new HashSet<Character>(str1.length()
+);
for (Character c : str1.toCharArray()) {
}
for (Character c : str2.toCharArray()) {
if (have.contains(c)) {
continue;
}
return 1;
}
return 0;
}

// List of unique chars in alphabetic order in str1 & str2
private static String uniq(String str1, String str2) {
HashSet<Character> have = new HashSet<Character>();
for (Character c : str1.toCharArray()) {
}
for (Character c : str2.toCharArray()) {
}
StringBuilder unique = new StringBuilder(26);
for (char let : alphabet) {
if (have.contains(let)) {
unique.append(let);
}
}
return unique.toString();
}

private static int binomial(int n, int k) {
int c = 1;
for (int i = 0; i < k; ++i) {
c *= n - i;
c /= i + 1;
}
return c;
}

private static int getBit(int len, String str) {
int    sum = 0;
for (int i = 0; i < len; ++i) {
sum += binomial(lookup.get(str.charAt(i)) - 1, i + 1);
}
return sum;
}
}

#### Phase 4: Four at a time (new approach)

Number of lines in input: 8_809_183
Number of lines in output: 1
Execution time: 0 min 7 seconds

The last phase brought the solution close enough just to scan for a word that contained all the missing letters to reach a-z.

```#!/usr/bin/perl
use strict;
use warnings;
use Inline C =>;

my \$file = \$ARGV[0] || 'phase1.data';
open(my \$fh, '<', \$file) or die "Unable to open '\$file' for reading: \$
+!";

my @word;
while (<\$fh>) {
chomp;
my (\$nform, \$word) = split /\t/;
push @word, \$word;
}

\$file = \$ARGV[1] || 'phase3.data';
open(\$fh, '<', \$file) or die "Unable to open '\$file' for reading: \$!";
while (<\$fh>) {
my (\$nform, \$str1, \$str2, \$str3) = split /\t/;
for (@word) {
if (! diff('abcdefghijklmnopqrstuvwxyz', \$_ . \$nform)) {
print "abcdefghijklmnopqrstuvwxyz\t\$_\t\$str1\t\$str2\t\$str3
+";
exit;
}
}
}
print "No Cigar\n";

__END__
__C__

SV *diff(char *str1, char *str2) {
/* Actual code has 256 0s - truncated for post */
char exists[256] = {};
SV *ret_sv = newSVpvn("",0);

/* identify all chars present in str2 */
while (*str2) {
exists[(U8)*str2++] = 1;
}

/* Determine chars in str1 not in str2 */
for ( ; *str1 ; str1++ )
if (! exists[(U8)*str1])
sv_catpvn(ret_sv,str1,1);

return ret_sv;
}

#### Phase 5: Generate powersets

Update: Using a Perl + Inline::C version, this phase takes 3 min 16 seconds (details).

Number of lines in input: 9_448_847
Number of lines in output: 67_108_863
Execution time: 66 min 20 seconds

Yep, more Java. I also have a Perl version if anyone is interested but I wanted the final solution to take less than 8 hours to run so that someone wanting to reproduce my results could.

```import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.BitSet;
import java.util.HashMap;
import java.util.Scanner;

public class Phase5 {
public static final String regex = "[a-z]+";

public static BitSet[] seen = {
new BitSet(),        new BitSet(26),       new BitSet(325),
+  new BitSet(2600),
new BitSet(14950),   new BitSet(65780),    new BitSet(230230),
+  new BitSet(657800),
new BitSet(1562275), new BitSet(3124550),  new BitSet(5311735)
+, new BitSet(7726160),
new BitSet(9657700), new BitSet(10400600), new BitSet(9657700)
+, new BitSet(7726160),
new BitSet(5311735), new BitSet(3124550),  new BitSet(1562275)
+, new BitSet(657800),
new BitSet(230230),  new BitSet(65780),    new BitSet(14950),
+  new BitSet(2600),
new BitSet(325),     new BitSet(26),       new BitSet(1)
};

public static HashMap<Character, Integer> lookup = new HashMap<Cha
+racter, Integer>(26);

public static void main(String[] args) {
initlookup();
String  line = null;
Scanner s2   = null;
for (String filename : args) {
try {
Scanner scr = new Scanner(new File(filename));
while (scr.hasNextLine()) {
line = scr.nextLine();
s2 = new Scanner(line);
if (! s2.hasNext(regex)) {
continue;
}
getSets(s2.next(regex), line);
}
}
catch (IOException ioe) {
ioe.printStackTrace();
}
}
}

private static void initlookup () {
// Actual code goes a-z, reduced for post
lookup.put('a', new Integer("1"));
lookup.put('b', new Integer("2"));
}

private static void getSets(String str, String line) {
int len = str.length();
int bit = getBit(len, str);
if (! seen[len].get(bit)) {
seen[len].set(bit);
System.out.println(str + "\t" + line);
for (StringBuilder set : subsets(str)) {
getSets(set.toString(), line);
}
}
}

private static ArrayList<StringBuilder> subsets(String str) {
ArrayList<StringBuilder> subs = new ArrayList<StringBuilder>()
+;
if (str.length() == 1) {
return subs;
}
for (int i = 0; i < str.length(); ++i) {
StringBuilder set = new StringBuilder(str);
set.deleteCharAt(i);
}
return subs;
}

private static int binomial(int n, int k) {
int c = 1;
for (int i = 0; i < k; ++i) {
c *= n - i;
c /= i + 1;
}
return c;
}

private static int getBit(int len, String str) {
int    sum = 0;
String key;
for (int i = 0; i < len; ++i) {
sum += binomial(lookup.get(str.charAt(i)) - 1, i + 1);
}
return sum;
}
}

#### Creating a working final product

I decided to use a DBD::SQlite database with a very simple web front end to demo it. The split into 26 tables was left over from an earlier disk space optimization idea that I didn't pursue so the following code could be simplified further:

Split the data by length and convert solution words to indices (took 12 minutes)
```#!/usr/bin/perl
use strict;
use warnings;
use Storable;

my %fwd;
my \$file = \$ARGV[0] || 'phase1.data';
open(my \$fh, '<', \$file) or die "Unable to open '\$file' for reading: \$
+!";

my \$n;
while ( <\$fh> ) {
chomp;
my (\$nform, \$word) = split /\t/;
\$fwd{\$word} = ++\$n;
}

my %sol_fh;
for (1 .. 26) {
my \$file_name = sprintf("%.2d", \$_) . ".data";
open(\$sol_fh{\$_}, '>', \$file_name) or die \$!;
}

\$file = \$ARGV[1] || 'phase5.data';
open(\$fh, '<', \$file) or die "Unable to open '\$file' for reading: \$!";

while ( <\$fh> ) {
chomp;
my (\$nform, undef, @words) = split /\t/;
\$_ = \$fwd{\$_} for @words;
print { \$sol_fh{length(\$nform)} } \$nform, "\t", (join "-", @words)
+, "\n";
}

my %rev = reverse %fwd;
store \%rev, 'sol.rev';
Build the database and use the sqlite3 shell interface to import the data
```#!/usr/bin/perl
use strict;
use warnings;
use DBD::SQLite;

my \$dbh = DBI->connect("dbi:SQLite:dbname=solution.db","","") or die \$
+DBI::errstr;

for my \$size (1 .. 26) {
my \$sql = "CREATE TABLE solution\$size (nform TEXT, solution TEXT)"
+;
\$dbh->do(\$sql) or die \$dbh->errstr;
}
\$dbh->disconnect or die \$dbh->errstr;
Create a simple web front end
```#!/usr/bin/perl -T
use strict;
use warnings;
use CGI::Simple;
use DBD::SQLite;
use HTML::Template;
use Storable;

\$CGI::Simple::POST_MAX        = 1024;
#\$CGI::Simple::DEBUG = 1;

my \$q = CGI::Simple->new();

# TODO: if unable to get exclusive lock on \$0 display_error()
# TODO: Add error handling - duh!

sub display_question {
my \$template = HTML::Template->new(filename => 'question.tmpl');
print \$template->output();
exit(0);
}

my \$template = HTML::Template->new(filename => 'answer.tmpl');
my \$input = get_input();
my \$len   = length(\$input);
my \$rev   = retrieve('sol.rev');
my \$dbh   = DBI->connect("dbi:SQLite:dbname=solution.db","","")
or die \$DBI::errstr;
my \$sth   = \$dbh->prepare("SELECT solution FROM solution\$len WHERE
+ nform=?")
or die \$dbh->errstr;
\$sth->execute(\$input)
+     or die \$dbh->errstr;
my @word;
while (my @row = \$sth->fetchrow_array() ) {
push @word, map { {WORD => \$rev->{\$_}} } split /-/, \$row[0];
}
\$template->param(USER_INPUT => \$input);
\$template->param(WORD_LIST  => \@word);
print \$template->output();
#\$sth->finish();
#\$dbh->disconnect;
exit(0);
}

sub get_input {
my \$input = \$q->param('question') || '';
\$input = lc(\$input);
\$input =~ tr/a-z//cd;
\$input ||= 'abcdefghijklmnopqrstuvwxyz';
my %uniq = map {\$_ => undef} split //, \$input;
return join '', sort keys %uniq;
}
Include a question template
```<html>
<body>
<center>
<h2>How many words does it take....</h2>
<p>
Please enter a unique list of lower case letters (a-z) below.
<br />
Be warned that some of the "words" may not be safe for work (NSFW).
</p>
<p>
<form name="display_question" action="demo.cgi">
<input type=text name="question">
<br />
<input type="submit" value="Submit">
</form>
</p>
</center>
</body>
</html>
```<html>
<body>
<center>
<h2>For <TMPL_VAR NAME=USER_INPUT> it takes....</h2>
<p>
<TMPL_LOOP NAME=WORD_LIST>
Word: <TMPL_VAR NAME=WORD>
<br />
</TMPL_LOOP>
</p>
</center>
</body>
</html>

# Conclusion

Answering all 67+ million questions using a dictionary of over 64K words can take less than 8 hours and the resulting lookup table (SQLite) is 2GB of disk space. If this were a real world problem, that is not a considerable investement.

#### Notes

I freely admit that I spent far more than 8 hours on this specific project. I also admit that each problem is unique and has no guarantee of being feasible. I came to the Monastery at least twice looking for help with optimizations:

I had to find the right balance of trading memory for speed since 1GB of RAM was not enough in many cases. I also had to find the right balance between reducing the output for the next phase with the amount of work required for that reduction.

The code as it appears can likely be optimized. I spent my time looking for algorithm related optimizations rather than micro-optimizations. While I used both Inline::C and Java in this project, I am not much more than a neophyte in either. In fact, this thread contains all the Java I have ever written in my life. I had help in those areas but the contributors were limited by my description of what I was trying to do so all deficiencies are mine rather than theirs.

I would like to thank everyone who helped me with this project. Rather than naming the few I can remember and offending those I can't, I will just say "you know who you are". You can try it out here. I welcome any questions or comments on the code and algorithms.

Cheers - L~R

Comment on How many words does it take?
Re: How many words does it take?
by educated_foo (Vicar) on Oct 23, 2006 at 17:19 UTC
One of my algorithms profs was fond of saying that if you find yourself faced with an NP complete problem, you're probably solving the wrong problem. In other words, usually you're dealing with a polynomial special case of what you think is the actual problem. Just FYI -- I haven't digested your enormous post yet...
Re: How many words does it take?
by kaif (Friar) on Nov 17, 2006 at 11:18 UTC

I'm surprised that essentially no one has commented on this mammoth of a post in almost a month. I'll give it a shot.

I decided to try to "precalculate enough information to allow each question to be solveable in polynomial time." My approach is heavily influenced by yours, particularly the part about "not all words need be considered" (e.g., "screwdriver" contains all of the letters "disc" does).

Since you already used a language other than Perl, I decided to use C++. At the end of this post, I have included two programs, one for precomputation and one for queries. The first takes in a dictionary file and produces a "digest" file -- a list of all useful singles, doubles, triples, and quadruples (at 18_341 lines and 394K, it's smaller than the dictionary). The other one answers questions based on it, assuming the digest file is "digest.txt". As with your Java code, please excuse idiosyncrasies in my C++ code; over time, I've developed a mostly consistent, but perhaps a tad strange, coding style. (In particular, I can get rid of the pointers in this code, but I just think it's easier to read this way.)

Of obvious interest is how fast my code runs. Here's the summary of running the precomputation code on my computer, which will hopefully also help explain the approach: (Incidentally, I believe it takes between a gig and two of RAM, slightly more than your requirements -- I could knock that down if I wanted, but I didn't feel the need.)

I also tried running all 67_108_863 = 2**26 - 1 different combinations through my query answerer. It took a total of 42 minutes, for an average rate of ~25,000 queries per second.

Overall summary is that coding took about two hours from scratch, precomputation took 16 minutes, and query answering is definitely fast enough. Enjoy!

kaif,
Thank you for replying. It will take me some time to digest it. Since having posted it I have already discovered better and faster ways of doing things. For instance, the generation of the powerset which took 66 minutes in Java only takes 3 minutes in C. I have lots of fun projects on the back burner that take precedence over this but perhaps I might post a "How I would do it today" sometime in the future.

Thanks again for the code and the description - I am sure I will learn a lot from it.

Cheers - L~R

Reaped: Re: How many words does it take?
by NodeReaper (Curate) on Dec 09, 2009 at 17:51 UTC

Create A New User
Node Status?
node history
Node Type: perlmeditation [id://580093]
Approved by friedo
Front-paged by liverpole
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (13)
As of 2013-12-12 00:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?