Just another Perl shrine PerlMonks

### Re: Estimating continuous functions

by Itatsumaki (Friar)
 on Mar 29, 2004 at 07:33 UTC ( #340529=note: print w/replies, xml ) Need Help??

in reply to Estimating continuous functions

There are two general approaches to this kind of problem. You can either try to estimate the underlying function from the data that you are given, or you can simply try to generate a smooth curve between the data-points. There are (naturally) advantages and disadvantages to each approach. :)

Predicting the function is a more complex business, and usually requires some simplifying assumptions about the type of relationship between the dependent variable (the output) and the various independent variables (inputs). For instance, you can assume that the relationship is linear, or quadratic, or logarithmic, or... just about anything else. However that assumption generally has to be made up front.

You then perform a statistical operation called a "regression" on the data, using the relationships that you defined above. The regression attempts to estimate the "importance" of each parameter (as measured by a set of constant coefficients) by minimizing the deviations between the predicted and actual values. Actually you typically minimize the squared-deviations. To get an estimate of the "goodness-of-fit" you would take a look at the "residuals": how much of the actual data does your model explain?

This is really only a rough overview of regression. You have multiple dependent variables, so you are going to want a "multiple regression". The biggest problem with regression is that it takes a lot of data to accurately estimate a sophisticated model. So, if you have n independent variables (dimensions), and you assume that: i) the variables are completely orthogonal (e.g. the independent variables are totally unrelated) and ii) there is a linear relationship between each independent variable and the dependent variable then you will need at a minimum n independent observations to be able to estimate the model.

Overall, regression thus works really well when you have a lot of data, and can make reasonable estimates about the relationships between the dependent and independent variables. One nice thing about regression techniques is that they can (relatively) seamlessly handle categorical (non-numeric) data like {TRUE/FALSE} or {RED/GREEN/YELLOW}.

Your other option is to use a interpolating method. These methods do not attempt to directly model the interactions between dependent and independent models. Instead, they simply attempt to find the best smooth-fitting curve for the data. The two most important methods in use today are cubic-spline-based interpolation and fourier-approximations.

Cubic splines essentially fit cubic functions to subsets of your data. That is, a separate cubic function is fitted (regressed, actually) between each pair of data-points. This regression uses some (but not all!) of the surrounding data points. So if you have a continuous set of say 100 observations, you might estimate a cubic function between points 50 and 51 using only the data-points from 45 to 55. It turns out that this local interpolation handles discontinuities and large abrupt changes much better than global interpolations. Of course this (can) come at the price of much more computational overhead.

Fourier approximation attempts to estimate your data with a set of sinuisoidal functions (usually just sine and cosine). You can essentially approximate or fit any function (even whacked out engineering things like the heavy-side or weirdo physics stuff like the Dirac delta) with an infinite series of sinuisoidal functions. The theory behind Fourier approximations is a bit complex, but at some level you can simply think of it as regressing a large set of sine/cosine functions.

So... how are you supposed to chooose between these alternatives? The first two questions you should ask yourself are:

1. Do you need to understand the dependency structure between your dependent and independent variables? Do you already have some understanding of it that you would like to incorporate into your estimation?
2. Do you simply need to interpolate amongst the data-points that you already have, or will you need to extrapolate to new and untested values of the dependent and/or independent variables?

If you wish to understand or exploit the dependency structure amongst variables, then you will want to use a regression approach. On the other hand, if you are primarily interested in interpolation, you will almost certainly just need a curve-fitting method. Your choice gets a little trickier if you have no desire/ability to understand the relationships amongst variables, but need to extrapolate: both approaches can be used in that case, but with serious concerns about accuracy.

If you are using a curve-fitting approach, cubic splines are much easier to implement yourself and much more intuitive than Fourier transforms. If you data is inherently oscillatory, however, you might get a lot of value out of learning the details behind Fourier approximation.

If that sounds like a whole bunch of "maybes" and "possiblies" that's what it is. There are tons of numerical methods, and there are lots of application-specific trade-offs that can be made on complexity, computational efficiency, generality, and so forth.

Finally, here are a bunch of links on this stuff you might find helpful. When I first learned numerical methods I used Chapra and Canale, which was a good general book that suited people w/o a lot of mathematical background. You might want to check that out.

Good luck!

-Tats

Update: s/durve/curve/; thanks to PodMaster

Create A New User
Node Status?
node history
Node Type: note [id://340529]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2018-02-22 19:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
When it is dark outside I am happiest to see ...

Results (298 votes). Check out past polls.

Notices?