Just another Perl shrine | |
PerlMonks |
Re^5: algorithm for 'best subsets'by halley (Prior) |
on Mar 05, 2005 at 15:57 UTC ( [id://436911]=note: print w/replies, xml ) | Need Help?? |
Pseudocode for my version of the algorithm (without using G::UF or any graph concept at all), if I understand it correctly. By "kbits" I mean a keyword bit vector. By "parts" I mean partitions. Am I missing something?
It seems to work, and scans my whole current database of 5810 keywords in 6628 items in about three seconds. Unfortunately, it grows to about 5 partitions maximum, and by the time it's done, it has merged back everything into one partition. I think that's the fault of my keywords pruning, though. Even though I filter out the 100 most boring prepositions and articles, I need to find out the remaining words that cause the most mergers... Update: How depressing. Not only is 'war' the most common keyword in modern history, but it appears to be the common thread amongst all of the events as well; removing that one keyword broke the historical context into five separate partitions. --
In Section
Seekers of Perl Wisdom
|
|