Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Best guess for data type

by InfiniteSilence (Curate)
on Apr 22, 2013 at 16:51 UTC ( #1029924=note: print w/ replies, xml ) Need Help??


in reply to Best guess for data type

  • Develop a set of heuristics - (ex. \d+\.?\d+? or S+, etc.)
  • Apply these to a random sampling of the data
  • Establish a confidence level that the given data are X
  • Proceed under that presumption unless proven wrong in which case modify definition of X to Y

I suppose there are hundreds of other ways to go about this. The reason I chose the above is that you could have millions of pieces of data to look at and exhaustively looking at each column would be a bit absurd. Besides, you would probably only need to 'catch' an error when trying to perform an activity with a subset like obtaining a standard deviation. In that case you would check each value anyway.

Celebrate Intellectual Diversity


Comment on Re: Best guess for data type

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1029924]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2015-07-04 06:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (58 votes), past polls