Quantcast
Channel: Questions in topic: "sql-server-2005"
Viewing all articles
Browse latest Browse all 415

Can this linear regression algorithm for SQL Server 2K5 be made reusable?

$
0
0
The following block of code calculates the formula for a trend line using linear regression (method of least squares). It takes as its input two columns of data (i.e. a table) representing the x and y axes of a chart. It's output could be a single row (3 cols) of data representing the slope and y-intercept of the trendline, and the correlation coefficient. Is there a way in SQL 2K5 to create an in-line table-valued function, or a stored procedure that can take a table of x and y coordinates as a parameter? Is there some other way to make this code reusable. Otherwise whenever I want a trendline in the reporting client (SSRS 2008 R2) I'm going to have to copy/paste/hack this code into my work. Supplementary questions: Are there any gotchas in using the float data type here? Often the x axis will be datetime (eg week starting on mm/dd/yyyy). Are there any gotchas in converting a datetime to a numeric? --======= ================================================================================================= --======= Author: (originally) http://stackoverflow.com/users/92092/stephan See: --======= http://stackoverflow.com/questions/2536895/are-there-any-linear-regression-function-in-sql-server --======= Date: Mar 29 2010 09:58 --======= Purpose: Linear regression (method of least squares) --======= Comments: --======= Revisions: WHO WHEN WHAT --======= GPO 20121216 Simplified query (well... kinda) to limit calcs to one --======= regression equation per input dataset. --======= ================================================================================================= SET NOCOUNT ON --------- test data IF OBJECT_ID('tempdb..#some_table') IS NOT NULL DROP TABLE #some_table; SELECT 0 as sourceID, 1.2 as x, 1.0 as y INTO #some_table UNION ALL SELECT 1, 1.6, 1 UNION ALL SELECT 2, 2.0, 1.5 UNION ALL SELECT 3, 2.0, 1.75 UNION ALL SELECT 4, 2.1, 1.85 UNION ALL SELECT 5, 2.1, 2 UNION ALL SELECT 6, 2.2, 3 UNION ALL SELECT 7, 2.2, 3 UNION ALL SELECT 8, 2.3, 3.5 UNION ALL SELECT 9, 2.4, 4 UNION ALL SELECT 10, 2.5, 4 UNION ALL SELECT 11, 3, 4.5; --======= ================================================================================================= --======= linear regression query --======= Get average x==================================================================================== DECLARE @xbar as float; SET @xbar = ( SELECT avg(x) FROM #some_table ); --======= Get average y==================================================================================== DECLARE @ybar as float; SET @ybar = ( SELECT avg(y) FROM #some_table ); --======= Get beta (slope) estimate======================================================================== DECLARE @Beta as float; SET @Beta = ( SELECT SUM((x-@xbar)*(y-@ybar)) / --nullif to stop divided by zero nullif(SUM((x-@xbar)*(x-@xbar)),0) FROM #some_table pd ); --======= Get alpha (constant) estimate==================================================================== DECLARE @Alpha as float; SET @Alpha = @ybar - @xbar * @Beta; --======= Get Total Sum of Squares========================================================================= DECLARE @SS_tot as float; SET @SS_tot = ( SELECT SUM((y-@ybar)*(y-@ybar)) FROM #some_table ); --======= Get Total Sum of Squares due to Error============================================================ DECLARE @SS_err as float; SET @SS_err = ( SELECT SUM((y-(@Alpha+@Beta*x)) * (y-(@Alpha+@Beta*x))) FROM #some_table ); --======= Get r-squared (the correlation coefficient)====================================================== DECLARE @r_squared as float; SET @r_squared = CASE WHEN @SS_tot = 0 THEN 1.0 ELSE 1.0 - (@SS_err / @SS_tot) END; --======= Joining back to the source data allows the plotting of of the trend line along with the usual --======= plotting of x against y========================================================================== SELECT sourceID ,x ,y ,@Beta * x + @Alpha as y_trend --------- the final output from the sproc/iltvf could be a single row holding the following three values. ,@r_squared as r_squared ,@Alpha as Alpha ,@Beta as Beta FROM #some_table;

Viewing all articles
Browse latest Browse all 415

Trending Articles