Yes, that is correct and no it isn't a hw problem. I have individual protein families, each containing a various number of proteins. These proteins are not unique to a protein family. They can be in many families. I'm trying to find the best way to perform total protein coverage with the least number of protein families. I'm using a greedy algorithm. Instead of storing all 5million (rough guess) or so proteins in an array and checking them off once I've retrieved them I would like to put them in a really long string to save some space. I haven't tried using a string yet, because I've been working with a hash array combination.