Syntactic Confectionery Delight  
PerlMonks 
Re: Estimating continuous functionsby Itatsumaki (Friar) 
on Mar 29, 2004 at 07:33 UTC ( #340529=note: print w/ replies, xml )  Need Help?? 
There are two general approaches to this kind of problem. You can either try to estimate the underlying function from the data that you are given, or you can simply try to generate a smooth curve between the datapoints. There are (naturally) advantages and disadvantages to each approach. :) Predicting the function is a more complex business, and usually requires some simplifying assumptions about the type of relationship between the dependent variable (the output) and the various independent variables (inputs). For instance, you can assume that the relationship is linear, or quadratic, or logarithmic, or... just about anything else. However that assumption generally has to be made up front. You then perform a statistical operation called a "regression" on the data, using the relationships that you defined above. The regression attempts to estimate the "importance" of each parameter (as measured by a set of constant coefficients) by minimizing the deviations between the predicted and actual values. Actually you typically minimize the squareddeviations. To get an estimate of the "goodnessoffit" you would take a look at the "residuals": how much of the actual data does your model explain? This is really only a rough overview of regression. You have multiple dependent variables, so you are going to want a "multiple regression". The biggest problem with regression is that it takes a lot of data to accurately estimate a sophisticated model. So, if you have n independent variables (dimensions), and you assume that: i) the variables are completely orthogonal (e.g. the independent variables are totally unrelated) and ii) there is a linear relationship between each independent variable and the dependent variable then you will need at a minimum n independent observations to be able to estimate the model. Overall, regression thus works really well when you have a lot of data, and can make reasonable estimates about the relationships between the dependent and independent variables. One nice thing about regression techniques is that they can (relatively) seamlessly handle categorical (nonnumeric) data like {TRUE/FALSE} or {RED/GREEN/YELLOW}. Your other option is to use a interpolating method. These methods do not attempt to directly model the interactions between dependent and independent models. Instead, they simply attempt to find the best smoothfitting curve for the data. The two most important methods in use today are cubicsplinebased interpolation and fourierapproximations. Cubic splines essentially fit cubic functions to subsets of your data. That is, a separate cubic function is fitted (regressed, actually) between each pair of datapoints. This regression uses some (but not all!) of the surrounding data points. So if you have a continuous set of say 100 observations, you might estimate a cubic function between points 50 and 51 using only the datapoints from 45 to 55. It turns out that this local interpolation handles discontinuities and large abrupt changes much better than global interpolations. Of course this (can) come at the price of much more computational overhead. Fourier approximation attempts to estimate your data with a set of sinuisoidal functions (usually just sine and cosine). You can essentially approximate or fit any function (even whacked out engineering things like the heavyside or weirdo physics stuff like the Dirac delta) with an infinite series of sinuisoidal functions. The theory behind Fourier approximations is a bit complex, but at some level you can simply think of it as regressing a large set of sine/cosine functions. So... how are you supposed to chooose between these alternatives? The first two questions you should ask yourself are:
If you wish to understand or exploit the dependency structure amongst variables, then you will want to use a regression approach. On the other hand, if you are primarily interested in interpolation, you will almost certainly just need a curvefitting method. Your choice gets a little trickier if you have no desire/ability to understand the relationships amongst variables, but need to extrapolate: both approaches can be used in that case, but with serious concerns about accuracy. If you are using a curvefitting approach, cubic splines are much easier to implement yourself and much more intuitive than Fourier transforms. If you data is inherently oscillatory, however, you might get a lot of value out of learning the details behind Fourier approximation. If that sounds like a whole bunch of "maybes" and "possiblies" that's what it is. There are tons of numerical methods, and there are lots of applicationspecific tradeoffs that can be made on complexity, computational efficiency, generality, and so forth. Finally, here are a bunch of links on this stuff you might find helpful. When I first learned numerical methods I used Chapra and Canale, which was a good general book that suited people w/o a lot of mathematical background. You might want to check that out. Good luck! Tats Basic tutorial Update: s/durve/curve/; thanks to PodMaster
In Section
Meditations

