http://www.perlmonks.org?node_id=340862


in reply to Re: Estimating continuous functions
in thread Estimating continuous functions

Cool.

I guess that's the essence of that two dimensional example that I gave. The difference being that the program does not have to determine which points it takes into account. It just uses all of the data points, and the weighting allows it to do so. That's also its weakness in some ways .. Correct? .. that it considers all data points to an extent no matter how removed they are... or for some other reason?

The fact that it isn't linear isn't necessarily a bad thing, though, is it? Depends on the type of function, I guess. I suppose another aspect that adds to its simplicity is that it doesn't require you to select whether the function is linear, quadratic, logarithmic, etc, before approximating it.

When (if?) I get a chance, I'll generate some graphs using the data set (which I actually don't have yet :-), varying the weighting, to see how the estimation looks visually. Perhaps I'll post a sample of them up here later on.

Thanks!

Zenon Zabinski | zdog | zdog@perlmonk.org


  • Comment on (zdog) Re: (2) Estimating continuous functions

Replies are listed 'Best First'.
Re: (zdog) Re: (2) Estimating continuous functions
by tilly (Archbishop) on Mar 30, 2004 at 05:33 UTC
    I'd say that its biggest weakness is probably the opposite. It tends to overuse the nearest available data point resulting in flat sections and fairly sharp transitions from one point dominating to the next. This results in functions that don't look very reasonable.

    Playing around with the weighting should let you achieve an acceptable trade-off between one point dominating and distant points having too much of an impact. The choice is going to be empirical however, there isn't a "best answer" to this problem.

    Furthermore if you know anything about what your underlying function looks like, you would probably do a lot better to use more traditional estimation techniques. In particular if you can get samples on some kind of useful grid, one of the usual curve-fitting algorithms will be easy to calculate and should give excellent results. Two standard kinds of often-used curve-fitting algorithms are polynomial (eg cubic splines) and wavelets.

      I took (or tried to take) a look at cubic spline interpolation, and a lot of the implementations seem to be for sets of (x, y) pairs. Adding the additional dimensions scared me off a bit.

      Either way, I do have some sort of idea of what the first function I have to do this for looks like. I'll describe it as best as I can (perhaps someone can help me improve my terminology to do so) .. but here it goes:

      This particular function has 3 independent variables, so it would look something like this: F(x, y, z). (Later on, I will need something that can handle even more.) As x is varied and the other variables are kept constant, the function is logarithmic. And as either y or z are varied and the other variables are kept constant, the function takes on a form similar to e**(k/x).

      Any ideas or pointers where to go from here?

      Zenon Zabinski | zdog | zdog@perlmonk.org


        If you know the rough form of the functional dependencies, try a multiple linear regression. You can even do that in Excel, and with only three independent variables you would only need a few parameters. Try regressing this:

        F(x,y,z) = a0 + (a1 x log(x)) + (a2 x exp{a3/y}) + (a4 x exp{a5/z})

        With any luck at all that will give you a reasonably good approximation while only fitting six parameters (a0..a5). YOu didn't indicate how *much* data you have, and if you need to interpolate or extrapolate, which are really important factors in selecting a method.

        Other options include finding a multi-dimensional spline libraries (Matlab has one, I think) somewhere. Alternatively, Tilly's suggestion reminded me of the loess smoothers. Those work by considering a span of "nearby" data-points to estimate the local shape curve. There is a multi-dimensional implementation built into the R programming language. The major problem with loess is that memory usage is a quadratic function (O(n2)) of the number of data-points.

        -Tats
        First of all for the general case, some back of the envelope estimates suggest to me that 1/distance is a better weighting than my original 1/distance**2.

        As for your specific function, you may find it worthwhile to do some transformations first. For instance if I understand your description, then log(F(x,y,z)) is roughly of the form K*log(log(x))/(y*z). So log(F(x,y,z))*y*z/log(log(x)) is roughly a constant.

        This is good because the estimator that I provided is going to give the best results when approximating functions that are roughly constant. (Cubic splines, etc, give very good results at approximating functions that locally look like low-degree polynomials.) And estimating this "rough constant" gives you (after reversing the above calculation) your original function F.

        In general a judicious application of general theory and specific knowledge about your situation is more effective than abstract theory by itself...