0

I am trying to rewrite a linear regression script (that I found i a thread here) to become a function, and I get the following error when I run the script:

Msg 156, Level 15, State 1, Procedure fn_LinearRegression, Line 9 Incorrect syntax near the keyword 'WITH'. Msg 319, Level 15, State 1, Procedure fn_LinearRegression, Line 9 Incorrect syntax near the keyword 'with'. If this statement is a common table expression, an xmlnamespaces clause or a change tracking context clause, the previous statement must be terminated with a semicolon. Msg 156, Level 15, State 1, Procedure fn_LinearRegression, Line 12 Incorrect syntax near the keyword 'AS'. Msg 102, Level 15, State 1, Procedure fn_LinearRegression, Line 18 Incorrect syntax near ','. Msg 102, Level 15, State 1, Procedure fn_LinearRegression, Line 28 Incorrect syntax near ','. Msg 102, Level 15, State 1, Procedure fn_LinearRegression, Line 36 Incorrect syntax near ','.

Here is the function:

    CREATE Function dbo.fn_LinearRegression 
(@groupID varchar(50), @x int, @y float)
RETURNS @regtable TABLE(a FLOAT, b FLOAT)
AS 
--
WITH some_table as (
select @groupID, @x, @y from TABLENAME -- replace table),

/*WITH*/ mean_estimates AS
(   SELECT GroupID
          ,AVG(x)                                                  AS xmean
          ,AVG(y)                                                  AS ymean
    FROM some_table pd
    GROUP BY GroupID
),
stdev_estimates AS
(   SELECT pd.GroupID
          -- T-SQL STDEV() implementation is not numerically stable
          ,CASE      SUM(SQUARE(x - xmean)) WHEN 0 THEN 1 
           ELSE SQRT(SUM(SQUARE(x - xmean)) / (COUNT(*) - 1)) END AS xstdev
          ,     SQRT(SUM(SQUARE(y - ymean)) / (COUNT(*) - 1))     AS ystdev
    FROM some_table pd
    INNER JOIN mean_estimates  pm ON pm.GroupID = pd.GroupID
    GROUP BY pd.GroupID, pm.xmean, pm.ymean
),
standardized_data AS                   -- increases numerical stability
(   SELECT pd.GroupID
          ,(x - xmean) / xstdev                                    AS xstd
          ,CASE ystdev WHEN 0 THEN 0 ELSE (y - ymean) / ystdev END AS ystd
    FROM some_table pd
    INNER JOIN stdev_estimates ps ON ps.GroupID = pd.GroupID
    INNER JOIN mean_estimates  pm ON pm.GroupID = pd.GroupID
),
standardized_beta_estimates AS
(   SELECT GroupID
          ,CASE WHEN SUM(xstd * xstd) = 0 THEN 0
                ELSE SUM(xstd * ystd) / (COUNT(*) - 1) END         AS betastd
    FROM standardized_data
    GROUP BY GroupID
)
SELECT pb.GroupID
      ,ymean - xmean * betastd * ystdev / xstdev                   AS Alpha
      ,betastd * ystdev / xstdev                                   AS Beta
      ,CASE ystdev WHEN 0 THEN 1 ELSE betastd * betastd END        AS R2
      ,betastd                                                     AS Correl
      ,betastd * xstdev * ystdev                                   AS Covar

into TT_Auto_Temp_LM -- REPLACE TABLE
FROM standardized_beta_estimates pb
INNER JOIN stdev_estimates ps ON ps.GroupID = pb.GroupID
INNER JOIN mean_estimates  pm ON pm.GroupID = pb.GroupID;

--
Insert into @regtable ([A],[B]) VALUES (Alpha, Beta)

RETURN

I only have two outputs, as I only need Alpha and Beta.

2
  • 2
    ), missing after first CTE and as per my knowledge DML statements not allow in function if you are using SQL Server Commented Sep 22, 2016 at 8:24
  • I don't se a ), missing and I have a function in use that has a INSERT INTO statement Commented Sep 22, 2016 at 8:38

1 Answer 1

1

First and foremost you have syntax errors generated by the commenting out of the close bracket and comma on line which need to be on a new line:

select @groupID, @x, @y from TABLENAME -- replace table),

More importantly though, this needs to be a stored procedure as you are doing an insert into a table and then trying to select data from it (? this isn't actually clear from your code) which you can't do in a function.

Per the documentation: https://technet.microsoft.com/en-us/library/ms191320.aspx

User-defined functions cannot be used to perform actions that modify the database state.

Essentially, in a function you can only select data.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks iamdave. You are right. I actually made the script work but then I found out that it calculates wrong :(.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.