You're right that the quadratic term will, eventually, dominate. However, for the range I considered (0 <
XP < 100), adding that term results in a slightly better fit.
But, the fit is nearly as good without it, and so for interpretive purposes (instead of get-the-best-fit purposes), dropping the quadratic term makes for a better model:
Read 99 items
Read 99 items
Call:
lm(formula = log10(count) ~ xp)
Residuals:
Min 1Q Median 3Q Max
-0.16823 -0.10095 -0.01733 0.07757 0.25553
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.194851 0.023379 179.43 <2e-16 ***
xp -0.027268 0.000406 -67.17 <2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.1154 on 97 degrees of freedom
Multiple R-Squared: 0.979, Adjusted R-squared: 0.9787
F-statistic: 4512 on 1 and 97 DF, p-value: < 2.2e-16
With this model, our estimating function is as follows:
sub estimate_count_from_xp($) {
my $xp = shift;
10 ** ( 4.195 - 0.2727 * $xp );
}
From this, it's easy to see that we have classic exponential decay w.r.t.
XP.
Does this match your intuition?