|go ahead... be a heretic|
Re^4: Powerset short-circuit optimizationby Limbic~Region (Chancellor)
|on Oct 26, 2006 at 13:28 UTC||Need Help??|
I am sad to tell you that despite providing an iterative version that need not be called more than necessary, it is terribly slow. I timed (not Benchmarked) 4 versions and unfortunately this wasn't even a contender:
The code to generate the 3_477 line data file and the recursive java version can be found at How many words does it take?. The two recursive perl versions are below:
They both share the following code:
The 13 second version has subsets() as
The 28 second version has subsets() as
I made minor modifications to your code to handle my dataset as well as produce comparable output:
Update 1: After your 3rd update, your code finished in a respectable 78 seconds. Unfortunately it is still producing about 40% duplicates. Additionally, it doesn't produce the correct output (missing missing 92_835 strings out of 508_062). For instance 'cdglnst' does not appear at all in your output.
Update 2: After your 4th update, your code narrowly makes 3rd place with 26 seconds and correct output! I included the entire perl script I am using above to ensure we are comparing apples to apples. Admittedly, yours does scale much better with both speed and memory. Unfortunately, it still isn't quite up to the task I needed. I will have to put this in my back pocket for later though.
Cheers - L~R