The post filtering would be (in words) something like: For category 1 (square, circle, whatever), list all attributes that occur greater than 75% of the time in the items listed in category 1, but less than 25% of the time in the items listed in category 2, 3, 4, and 5, and less than 5% of the time in the items listed under category 6. (Ultimately each category will be set up with it's own set of variables for custom tailored percentages for each comparison.)
The end output would be a list of attributes by category that are unique to that category.
EDIT: And I think my issues with speed aren't here yet. I am anticipating it though as this transitions from reading from a set file, to gathering a series of raw values from the database and calculating the binary for the attributes.