- Develop a set of heuristics - (ex. \d+\.?\d+? or S+, etc.)
- Apply these to a random sampling of the data
- Establish a confidence level that the given data are X
- Proceed under that presumption unless proven wrong in which case modify definition of X to Y
I suppose there are hundreds of other ways to go about this. The reason I chose the above is that you could have millions of pieces of data to look at and exhaustively looking at each column would be a bit absurd. Besides, you would probably only need to 'catch' an error when trying to perform an activity with a subset like obtaining a standard deviation. In that case you would check each value anyway.
Celebrate Intellectual Diversity