There is a point of misunderstanding, though, and that is I am aiming for a sample of random sequences. Each sequence would be five to ten characters, but the sample would be comprised of a few million such sequences. Thus, if my sample size is ten million strings, and each string is ten characters, and there are a million valid utf-8 characters, the each character would be in the sample an average of 100 times. It is a statistical approach; each item in the sample has just a tiny portion of all possible values, but the whole sample includes all possible values multiple times. I tend to be a bit thorough when testing code I am not familiar with (my code for computing eigensystems of general matrices was testing on 100 million randomly generated matrices - with not one failure BTW).