1

Context

We're trying to built a database-application that can help first-year students prepare for their SQL-exam by assessing if a SQL-statement is a solution for the given question. The basic version simply checks if the given answer exists in the database as a known correct answer, if not it is sent to the teacher who then adds it to the database or discards it. In the next iteration we would like the application to asses answers more in the way it's done at exams, so you can still get some points even if parts of your query/statement are wrong.

Goal

To get this done we need to be able to 'break up' a statement.

For example, the answer:

SELECT Movie.movie_id AS 'Movie ID', Movie.title AS 'Movie Title', COUNT(*) AS 'Nr of directors'  
FROM Movie  
INNER JOIN Movie_Director  
ON Movie.movie_id = Movie_Director.movie_id  
WHERE Movie.publication_year = 2003  
GROUP BY Movie.movie_id, Movie.title  
HAVING COUNT(*) > 1  
ORDER BY 3 DESC  

Should be broken down into all the different clauses (select/from/inner join/where/having/....)

There doesn't seem to be an easy way to add a table to show the desired output and my attempt to make one only makes it less clear. And really the exact output format doesn't matter that much. I hope the intention is clear.

Besides this type of query the application needs to be able to asses: CREATE/ALTER table (adding PK, FK or Check-constraints) and DELETE/INSERT/UPDATE types of statements.

What I found so far

Looking online for SQL-parsers gives plenty results, but all in different languages. Our applications needs to run entirely inside SQL Server. SQL Server seems to parse every query/statement, but it's not clear to me if the results of this are accessible and/or useful to me.

Questions

  • Is there a reason why there doesn't seem to be a library for this purpose? Is my usecase really that specific or am I missing something?
  • Are the results of the parsing that SQL Server does accessible to me? And if so would these results be useful for the stated goal?
  • Am I better of writing my own parser? Or am I missing an obvious option here?
8
  • 4
    Writing a SQL Server compatible parser is a huge task. Commented Dec 1, 2020 at 9:23
  • Can't you just run the SQL (carefully) and check the result is correct? (Or a hash of the result is correct) You could have a question - a sample correct answer, and get SQL Server to run the sample answer and compare it's result to the submitted answer. Make sure you switch to a limited-privilege user before running any code automatically learn.microsoft.com/en-us/sql/t-sql/statements/… Commented Dec 1, 2020 at 9:28
  • @JeffUK, yes sure. We check syntax and results by running it ( in a save way ). I didn't include this because it's not really relevant for the question. The real issue is being able to break down the statements as explained above. Commented Dec 1, 2020 at 9:33
  • @jarlh, I probably don't mean a 'real' parser. Just a function that's able to break down the statement ( a string ) into parts as explained above. Or is this still likely to turn out to be a huge task? Commented Dec 1, 2020 at 9:35
  • 1
    It really depends what you mean by 'Parts', you could get some parts using basic string manipulation (Find everything between the first 'Select' and the last 'From' for instance). But it would be very naive. If you want to give them 'marks' even if they make badly formatted SQL statements, by definition it's impossible to parse that automatically. e.g. if they spell 'SELECT' wrong but everything else is correct, a human might give them a mark but a computer would fail immediately. You can't automate discretion! Commented Dec 1, 2020 at 9:39

2 Answers 2

1

using a clr parser function, dbo.parseSqlToXml()

declare @sql Nvarchar(max)= N'
SELECT Movie.movie_id AS ''Movie ID'', Movie.title AS ''Movie Title'', COUNT(*) AS ''Nr of directors''  
FROM Movie  
INNER JOIN Movie_Director  
ON Movie.movie_id = Movie_Director.movie_id  
WHERE Movie.publication_year = 2003  
GROUP BY Movie.movie_id, Movie.title  
HAVING COUNT(*) > 1  
ORDER BY 3 DESC
';

select 
t.thexml.query('data(SqlScript/Errors/Error)').value('.', 'Nvarchar(max)') as _errors,
s.sel.value('comment()[1]', 'varchar(max)') as _query,
s.sel.value('(SqlSelectClause/comment())[1]', 'Nvarchar(max)') as _select,
s.sel.value('(SqlFromClause/comment())[1]', 'Nvarchar(max)') as _from,
s.sel.value('(SqlWhereClause/comment())[1]', 'Nvarchar(max)') as _where,
s.sel.value('(SqlGroupByClause/comment())[1]', 'Nvarchar(max)') as _groupby,
s.sel.value('(SqlHavingClause/comment())[1]', 'Nvarchar(max)') as _having,
s.sel.value('(../SqlOrderByClause/comment())[1]', 'Nvarchar(max)') as _orderby
from
(
select cast(dbo.parseSqlToXml(@sql) as xml) as thexml
) as t
cross apply t.thexml.nodes('//*[SqlSelectClause]') as s(sel);

...will return

+---------+----------------------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------+-------------------------------------+--------------------------------------+---------------------+-----------------+
| _errors |                                   _query                                   |                              _select                               |                   _from                    |               _where                |               _groupby               |       _having       |    _orderby     |
+---------+----------------------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------+-------------------------------------+--------------------------------------+---------------------+-----------------+
|         | SELECT Movie.movie_id AS 'Movie ID', Movie.title AS ...HAVING COUNT(*) > 1 | SELECT Movie.movie_id AS 'Movie ID', Movie... AS 'Nr of directors' | FROM Movie   INNER JOIN Movie_Director.... | WHERE Movie.publication_year = 2003 | GROUP BY Movie.movie_id, Movie.title | HAVING COUNT(*) > 1 | ORDER BY 3 DESC |
+---------+----------------------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------+-------------------------------------+--------------------------------------+---------------------+-----------------+
Sign up to request clarification or add additional context in comments.

1 Comment

That's really cool, thanks. I'll look into it in detail when I have time. It seems to be what I'm looking for, will accept your answer if that's indeed the case.
0

For MSSQL Server you can use Microsoft.SqlServer.Management.SqlParser.Parser from Microsoft.SqlServer.Management.SqlParser.dll assembly.

the result of the parser will be array of tokens: https://learn.microsoft.com/en-us/dotnet/api/microsoft.sqlserver.management.sqlparser.parser.tokens?redirectedfrom=MSDN&view=sql-smo-150

1 Comment

Thanks for the input, I came across this but didn't see how it would be useful in achieving the stated goal.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.