Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: finding R-squared..please help

by Eliya (Vicar)
on Feb 19, 2012 at 03:16 UTC ( [id://954830]=note: print w/replies, xml ) Need Help??


in reply to finding R-squared..please help

Don't reinvent the wheel.  Use one of the existing libraries/packages for this.  For example Math::GSL, which is a binding to the GSL library.

The R-squared for a linear least squares fit of two vectors @$x, @$y (with x predicting y) can be computed as follows:

#!/usr/bin/perl -w use strict; use Math::GSL::Fit "gsl_fit_linear"; use Math::GSL::Statistics "gsl_stats_tss"; my $x = [1,2,3,4,5]; my $y = [5,7,8,12,13]; my $n = @$y; my ($status, $c0, $c1, $cov00, $cov01, $cov11, $ss_resid) = gsl_fit_linear($x, 1, $y, 1, $n); my $ss_total = gsl_stats_tss($y, 1, $n); print "SS residual = $ss_resid\n"; print "SS total = $ss_total\n"; my $R2 = 1 - $ss_resid / $ss_total; print "R^2 = $R2\n";
SS residual = 1.9 SS total = 46 R^2 = 0.958695652173913

If I'm understanding you correctly, you'd want to do this computation for every pair of lines.

(Note that for R^2 to be computable, there has to be some variance in the data to be predicted (as often with statistics).  In other words, $y = [2,2,2,2,2,2] wouldn't work, because here $ss_total would be 0, which would cause a division by zero error.)

Replies are listed 'Best First'.
Re^2: finding R-squared..please help
by david_lyon (Sexton) on Feb 19, 2012 at 23:51 UTC
    Very nice code thank you so much Eliya! A question in passing do you know why this online calculator http://ea +sycalculation.com/statistics/r-squared.php can find r^2 (but not r) o +n values: "2,2,2,2" "2,2,2,1" Is that a bug on their software? Thanks again for your help!

      Strictly speaking, neither r nor R^2 can be computed with those vectors (so yes, you could consider it a bug).   This is because no line can be fitted with finite values for $c0 and $c1.  And when there's no line, there are no residuals, etc.

      When you change the values slightly to

      2,2,2,2.0001 2,2,2,1

      a line can be fitted (so you also get correct values on that site), but the fitted line parameters are already rather large:

      my $c0 = 20001.9999999578; # as computed by gsl_fit_linear() my $c1 = -9999.9999999789; say $c0 + $c1 * $_ for 2, 2.0001; __END__ 2 1

      The smaller you make the deviation from 2 for the last value of vector x, the larger the fitted parameters become, eventually approaching delicately balancing +/- "infinities".  And when all values are exactly 2, no fit can be computed any longer...   (try 2.0000001 and 2.00000001 on the linked site, and you'll already get nonsensical values (correct values should always be r = -1, r^2 = 1) — which means numerical precision is rather low).

        Thanks you very much for you help.... All the best!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://954830]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-24 00:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found