Greetings,

I've created a utility using Statistics::LineFit and another using Gnuplot and fed both the same sample data. The results differ, so I must have made a mistake, but I can't see where.

## Perl code

`#!/usr/bin/perl
use strict;
use warnings;
use Statistics::LineFit;
use Time::Local;
use Data::Dumper;
my @x_axis;
my @y_axes;
sub date_to_epoch
{
my $date = shift;
my ( $y, $m, $d ) = split /-/, $date;
return timelocal( '59', '59', '23', $d, $m, $y );
#return timelocal( '0', '0', '0', $d, $m, $y );
}
sub max_value
{
my @array = @_;
my $max = $array[0];
for ( my $i = 0; $i <= $#array; $i++ )
{
$max = $array[$i] if ( $array[$i] > $max );
}
return $max;
}
my @epochs;
while (<DATA>)
{
next if ( m/^#/ );
chomp;
if ( my @line = split /\s+/ )
{
my $epoch = date_to_epoch( $line[0] );
# factor down epoch or slope is too shallow.
push @x_axis, $epoch;
shift @line;
for ( my $y = 0; $y <= $#line; $y++ )
{
push @{$y_axes[$y]}, $line[$y] ;
}
}
}
print Dumper ( \@x_axis );
print Dumper ( \@y_axes );
my $lineFit = Statistics::LineFit->new( 0, 0 ); # TODO change 2nd to 1
$lineFit->setData( \@x_axis, \@{$y_axes[0]} ) or die "Invalid regressi
+on data\n";
my ( $intercept, $slope ) = $lineFit->coefficients();
print "Slope(m): $slope Y-intercept(b): $intercept\n";
my %fitline;
$fitline{y1} = $intercept;
$fitline{x1} = 0;
$fitline{y2} = max_value( @{$y_axes[0]} );
$fitline{x2} = ( $fitline{y2} - $fitline{y1} ) / $slope + $fitline{x1}
+;
print Dumper ( \%fitline );
__DATA__
# date notkept hosts
2014-04-01 50 10
2014-04-02 63 11
2014-04-03 120 12
2014-04-04 55 20
2014-04-05 60 22
2014-04-06 63 25
2014-04-07 52 24
`

## Gnuplot

`#!/usr/bin/gnuplot
#set output "test.png"
set title "Promises not kept"
set xlabel "Date"
set ylabel "Count"
set rmargin 7
set border linewidth 2
set style line 1 linecolor rgb 'blue' linetype 1 linewidth 2
set style line 2 linecolor rgb 'black' linetype 1 linewidth 2
set style fill solid
set xdata time
set timefmt "%Y-%m-%d"
set format x "%Y-%m-%d"
set grid front
set grid
set autoscale
# 1e8 reduces the epoch seconds for a less flat line.
h(x) = m2 * x + b2
fit h(x) 'test.dat' using 1:3 via m2,b2
p(x) = m1 * x + b1
fit p(x) 'test.dat' using 1:2 via m1,b1
#set terminal png enhanced size 1024,768
plot 'test.dat' using 1:2 title 'Promises not kept' with boxes lc rgb
+"orange", \
p(x) title 'Promise Trend' with lines linestyle 1, \
h(x) title 'Host Trend' with lines linestyle 2
`

## test.dat

`# date notkept hosts
2014-04-01 50 10
2014-04-02 63 11
2014-04-03 120 12
2014-04-04 55 20
2014-04-05 60 22
2014-04-06 63 25
2014-04-07 52 24
`

## Perl results

`$VAR1 = [
1399003199,
1399089599,
1399175999,
1399262399,
1399348799,
1399435199,
1399521599
];
$VAR1 = [
[
'50',
'63',
'120',
'55',
'60',
'63',
'52'
],
[
'10',
'11',
'12',
'20',
'22',
'25',
'24'
]
];
Slope(m): -2.23214285714286e-05 Y-intercept(b): 31299.6785491071
$VAR1 = {
'y1' => '31299.6785491071',
'x2' => 1396849599,
'y2' => 120,
'x1' => 0
};
`

## Gnuplot results

`Final set of parameters Asymptotic Standard Error
======================= ==========================
m1 = 1.44796e-07 +/- 5.823e-05 (4.022e+04%)
b1 = 1 +/- 2.62e+04 (2.62e+06%)
correlation matrix of the fit parameters:
m1 b1
m1 1.000
b1 -1.000 1.000
`

Note that m1 and b1 from gnuplot are not the same as Slope and Y-intercept from Perl. Why?

Comment onStatistics::LineFit versus gnuplot, results differSelectorDownloadCode