in reply to Estimating continuous functions
Several people have pointed out that this is a very hard problem to get right. There is no perfect answer, and tremendous amounts of energy has been spent on finding pretty good ones.
But nobody has given you a simple to implement "OK" answer. Not great. Just something that can readily be implemented which gives an answer that you can defend as somewhat reasonable. Depending on the application, this is often enough.
Here is an outline of one.
Let's say that the last variable a function of the rest, call it f. Make f(x_1, x_2, ... , x_(n1)) into the weighted average of the known values of f at the points that you have. A reasonable weighting is that a point is weighted with weight 1/(square of distance from that point to the spot you're estimating). Then you can do it something like this (with some error checking added, of course...):
sub make_estimator {
return sub {0} unless @_; # You might want to die?
my @points = @_;
return sub {
my $weight_total;
my $weighted_sum;
foreach my $point (@points) {
my $dist_squared = 0;
foreach my $coord (0..$#_) {
$dist_squared += ($point>[$coord]  $_[$coord])**2;
}
if (0 == $dist_squared) {
return $point>[1];
}
else {
my $weight = 1/$dist_squared;
$weight_total += $weight;
$weighted_sum += $weight * $point>[1];
}
}
return $weighted_sum/$weight_total;
};
}
my $estimator = make_estimator(
[3, 4],
[4, 5],
);
print $estimator>(3.5);
This works perfectly well. The resulting function is trivially smooth everywhere except at the estimating points, and I'm reasonably surepositive that it is smooth there as well. It isn't linear or particularly simple though.
Feel free to play with the weighting function.
Update: I verified it. The function is also smooth at the estimating points. Its derivative at all estimating points is 0 (ie each is a maxima, minima, or inflection point).
(zdog) Re: (2) Estimating continuous functions by zdog (Priest) on Mar 30, 2004 at 04:37 UTC 
Cool.
I guess that's the essence of that two dimensional example that I gave. The difference being that the program does not have to determine which points it takes into account. It just uses all of the data points, and the weighting allows it to do so. That's also its weakness in some ways .. Correct? .. that it considers all data points to an extent no matter how removed they are... or for some other reason?
The fact that it isn't linear isn't necessarily a bad thing, though, is it? Depends on the type of function, I guess. I suppose another aspect that adds to its simplicity is that it doesn't require you to select whether the function is linear, quadratic, logarithmic, etc, before approximating it.
When (if?) I get a chance, I'll generate some graphs using the data set (which I actually don't have yet :), varying the weighting, to see how the estimation looks visually. Perhaps I'll post a sample of them up here later on.
Thanks!
Zenon Zabinski  zdog  zdog@perlmonk.org
 [reply] 

I'd say that its biggest weakness is probably the opposite. It tends to overuse the nearest available data point resulting in flat sections and fairly sharp transitions from one point dominating to the next. This results in functions that don't look very reasonable.
Playing around with the weighting should let you achieve an acceptable tradeoff between one point dominating and distant points having too much of an impact. The choice is going to be empirical however, there isn't a "best answer" to this problem.
Furthermore if you know anything about what your underlying function looks like, you would probably do a lot better to use more traditional estimation techniques. In particular if you can get samples on some kind of useful grid, one of the usual curvefitting algorithms will be easy to calculate and should give excellent results. Two standard kinds of oftenused curvefitting algorithms are polynomial (eg cubic splines) and wavelets.
 [reply] 

I took (or tried to take) a look at cubic spline interpolation, and a lot of the implementations seem to be for sets of (x, y) pairs. Adding the additional dimensions scared me off a bit.
Either way, I do have some sort of idea of what the first function I have to do this for looks like. I'll describe it as best as I can (perhaps someone can help me improve my terminology to do so) .. but here it goes:
This particular function has 3 independent variables, so it would look something like this: F(x, y, z). (Later on, I will need something that can handle even more.) As x is varied and the other variables are kept constant, the function is logarithmic. And as either y or z are varied and the other variables are kept constant, the function takes on a form similar to e**(k/x).
Any ideas or pointers where to go from here?
Zenon Zabinski  zdog  zdog@perlmonk.org
 [reply] 


Re: Re: Estimating continuous functions by Itatsumaki (Friar) on Mar 30, 2004 at 05:20 UTC 
That's pretty cool. That seems almost like a loess smoother, isn't it?
 [reply] 
