The following block of code calculates the formula for a trend line using linear regression (method of least squares). It takes as its input two columns of data (i.e. a table) representing the x and y axes of a chart. It's output could be a single row (3 cols) of data representing the slope and y-intercept of the trendline, and the correlation coefficient.
Is there a way in SQL 2K5 to create an in-line table-valued function, or a stored procedure that can take a table of x and y coordinates as a parameter? Is there some other way to make this code reusable. Otherwise whenever I want a trendline in the reporting client (SSRS 2008 R2) I'm going to have to copy/paste/hack this code into my work.
Supplementary questions:
Are there any gotchas in using the float data type here?
Often the x axis will be datetime (eg week starting on mm/dd/yyyy). Are there any gotchas in converting a datetime to a numeric?
--======= =================================================================================================
--======= Author: (originally) http://stackoverflow.com/users/92092/stephan See:
--======= http://stackoverflow.com/questions/2536895/are-there-any-linear-regression-function-in-sql-server
--======= Date: Mar 29 2010 09:58
--======= Purpose: Linear regression (method of least squares)
--======= Comments:
--======= Revisions: WHO WHEN WHAT
--======= GPO 20121216 Simplified query (well... kinda) to limit calcs to one
--======= regression equation per input dataset.
--======= =================================================================================================
SET NOCOUNT ON
--------- test data
IF OBJECT_ID('tempdb..#some_table') IS NOT NULL
DROP TABLE #some_table;
SELECT 0 as sourceID, 1.2 as x, 1.0 as y
INTO #some_table
UNION ALL SELECT 1, 1.6, 1
UNION ALL SELECT 2, 2.0, 1.5
UNION ALL SELECT 3, 2.0, 1.75
UNION ALL SELECT 4, 2.1, 1.85
UNION ALL SELECT 5, 2.1, 2
UNION ALL SELECT 6, 2.2, 3
UNION ALL SELECT 7, 2.2, 3
UNION ALL SELECT 8, 2.3, 3.5
UNION ALL SELECT 9, 2.4, 4
UNION ALL SELECT 10, 2.5, 4
UNION ALL SELECT 11, 3, 4.5;
--======= =================================================================================================
--======= linear regression query
--======= Get average x====================================================================================
DECLARE @xbar as float;
SET @xbar = ( SELECT avg(x)
FROM #some_table
);
--======= Get average y====================================================================================
DECLARE @ybar as float;
SET @ybar = ( SELECT avg(y)
FROM #some_table
);
--======= Get beta (slope) estimate========================================================================
DECLARE @Beta as float;
SET @Beta = ( SELECT SUM((x-@xbar)*(y-@ybar))
/ --nullif to stop divided by zero
nullif(SUM((x-@xbar)*(x-@xbar)),0)
FROM #some_table pd
);
--======= Get alpha (constant) estimate====================================================================
DECLARE @Alpha as float;
SET @Alpha = @ybar - @xbar * @Beta;
--======= Get Total Sum of Squares=========================================================================
DECLARE @SS_tot as float;
SET @SS_tot = ( SELECT SUM((y-@ybar)*(y-@ybar))
FROM #some_table
);
--======= Get Total Sum of Squares due to Error============================================================
DECLARE @SS_err as float;
SET @SS_err = ( SELECT SUM((y-(@Alpha+@Beta*x)) * (y-(@Alpha+@Beta*x)))
FROM #some_table
);
--======= Get r-squared (the correlation coefficient)======================================================
DECLARE @r_squared as float;
SET @r_squared = CASE WHEN @SS_tot = 0
THEN 1.0
ELSE 1.0 - (@SS_err / @SS_tot)
END;
--======= Joining back to the source data allows the plotting of of the trend line along with the usual
--======= plotting of x against y==========================================================================
SELECT sourceID
,x
,y
,@Beta * x + @Alpha as y_trend
--------- the final output from the sproc/iltvf could be a single row holding the following three values.
,@r_squared as r_squared
,@Alpha as Alpha
,@Beta as Beta
FROM #some_table;
↧