I am trying to rewrite a linear regression script (that I found i a thread here) to become a function, and I get the following error when I run the script:
Msg 156, Level 15, State 1, Procedure fn_LinearRegression, Line 9 Incorrect syntax near the keyword 'WITH'. Msg 319, Level 15, State 1, Procedure fn_LinearRegression, Line 9 Incorrect syntax near the keyword 'with'. If this statement is a common table expression, an xmlnamespaces clause or a change tracking context clause, the previous statement must be terminated with a semicolon. Msg 156, Level 15, State 1, Procedure fn_LinearRegression, Line 12 Incorrect syntax near the keyword 'AS'. Msg 102, Level 15, State 1, Procedure fn_LinearRegression, Line 18 Incorrect syntax near ','. Msg 102, Level 15, State 1, Procedure fn_LinearRegression, Line 28 Incorrect syntax near ','. Msg 102, Level 15, State 1, Procedure fn_LinearRegression, Line 36 Incorrect syntax near ','.
Here is the function:
CREATE Function dbo.fn_LinearRegression
(@groupID varchar(50), @x int, @y float)
RETURNS @regtable TABLE(a FLOAT, b FLOAT)
AS
--
WITH some_table as (
select @groupID, @x, @y from TABLENAME -- replace table),
/*WITH*/ mean_estimates AS
( SELECT GroupID
,AVG(x) AS xmean
,AVG(y) AS ymean
FROM some_table pd
GROUP BY GroupID
),
stdev_estimates AS
( SELECT pd.GroupID
-- T-SQL STDEV() implementation is not numerically stable
,CASE SUM(SQUARE(x - xmean)) WHEN 0 THEN 1
ELSE SQRT(SUM(SQUARE(x - xmean)) / (COUNT(*) - 1)) END AS xstdev
, SQRT(SUM(SQUARE(y - ymean)) / (COUNT(*) - 1)) AS ystdev
FROM some_table pd
INNER JOIN mean_estimates pm ON pm.GroupID = pd.GroupID
GROUP BY pd.GroupID, pm.xmean, pm.ymean
),
standardized_data AS -- increases numerical stability
( SELECT pd.GroupID
,(x - xmean) / xstdev AS xstd
,CASE ystdev WHEN 0 THEN 0 ELSE (y - ymean) / ystdev END AS ystd
FROM some_table pd
INNER JOIN stdev_estimates ps ON ps.GroupID = pd.GroupID
INNER JOIN mean_estimates pm ON pm.GroupID = pd.GroupID
),
standardized_beta_estimates AS
( SELECT GroupID
,CASE WHEN SUM(xstd * xstd) = 0 THEN 0
ELSE SUM(xstd * ystd) / (COUNT(*) - 1) END AS betastd
FROM standardized_data
GROUP BY GroupID
)
SELECT pb.GroupID
,ymean - xmean * betastd * ystdev / xstdev AS Alpha
,betastd * ystdev / xstdev AS Beta
,CASE ystdev WHEN 0 THEN 1 ELSE betastd * betastd END AS R2
,betastd AS Correl
,betastd * xstdev * ystdev AS Covar
into TT_Auto_Temp_LM -- REPLACE TABLE
FROM standardized_beta_estimates pb
INNER JOIN stdev_estimates ps ON ps.GroupID = pb.GroupID
INNER JOIN mean_estimates pm ON pm.GroupID = pb.GroupID;
--
Insert into @regtable ([A],[B]) VALUES (Alpha, Beta)
RETURN
I only have two outputs, as I only need Alpha and Beta.
),missing after first CTE and as per my knowledge DML statements not allow in function if you are using SQL Server