Using STRING_SPLIT in SQL for delimited values

Question

@InStr = '0|ABC|3033.9|3032.4444|0|0|0^1|DEF|3033.2577|3033.053|3032.0808|0|0^2|JHI|3032.8376|3033.2596|3033.2259|3033.322|0^3|XYZ|3032.8376|3032.8376|3032.8376|3032.8376|0'

I have the string above in a variable @InStr and I want to use STRING_SPLIT to inserts values into a table.

As you can see its a double split.

SELECT Value FROM STRING_SPLIT(@InStr,'^')

Produces:

0|ABC|3033.9|3032.4444|0|0|0
1|DEF|3033.2577|3033.053|3032.0808|0|0
2|JHI|3032.8376|3033.2596|3033.2259|3033.322|0
3|XYZ|3032.8376|3032.8376|3032.8376|3032.8376|0

Which is good, now I need to take each row and insert into a table.

I'm not sure how to combine the 2 splits to do the insert. The table has 7 columns which it would populate.

Any help appreciated.

Gottfried Lesigang · Accepted Answer · 2020-06-10 07:01:25Z

First of all: You should avoid STRING_SPLIT() in almost any case. It does not guarantee to return the items in the expected sort order. This might work in all your tests and break in production with silly hardly to find errors.

There are various answers already, the best one should be the table type parameter. But (if you cannot follow this route), I'd like to suggest two type-safe approaches:

DECLARE @InStr NVARCHAR(MAX) = '0|ABC|3033.9|3032.4444|0|0|0^1|DEF|3033.2577|3033.053|3032.0808|0|0^2|JHI|3032.8376|3033.2596|3033.2259|3033.322|0^3|XYZ|3032.8376|3032.8376|3032.8376|3032.8376|0';

--xml approach (working for almost any version)
--We do the double split in one single action and return a nested XML with <x> and <y> elements
--We can fetch the values type-safe from their 1-based position:

SELECT x.value('y[1]','int')           AS [First]
      ,x.value('y[2]','varchar(100)')  AS [Second]
      ,x.value('y[3]','decimal(28,8)') AS Third
      ,x.value('y[4]','decimal(28,8)') AS Fourth
      ,x.value('y[5]','decimal(28,8)') AS Fifth
      ,x.value('y[6]','decimal(28,8)') AS Sixth
      ,x.value('y[7]','decimal(28,8)') AS Seventh
FROM (VALUES(CAST('<x><y>' + REPLACE(REPLACE(@Instr,'|','</y><y>'),'^','</y></x><x><y>') + '</y></x>' AS XML)))v(Casted)
CROSS APPLY Casted.nodes('/x') b(x);

--json approach (needs v2016+)
--faster than XML
--We transform your string to a JSON-array with one item per row and use another OPENJSON to retrieve the array's items.
--The WITH-clause brings in implicit pivoting to retrieve the items type-safe as columns:

SELECT b.*
FROM OPENJSON(CONCAT('[["',REPLACE(@Instr,'^','"],["'),'"]]')) a
CROSS APPLY OPENJSON(CONCAT('[',REPLACE(a.[value],'|','","'),']'))
WITH([First]   INT           '$[0]'
    ,[Second]  VARCHAR(100)  '$[1]'
    ,[Third]   DECIMAL(28,8) '$[2]'
    ,[Fourth]  DECIMAL(28,8) '$[3]'
    ,[Fifth]   DECIMAL(28,8) '$[4]'
    ,[Sixth]   DECIMAL(28,8) '$[5]'
    ,[Seventh] DECIMAL(28,8) '$[6]') b;

Both approaches return the same result:

+-------+--------+---------------+---------------+---------------+---------------+------------+
| First | Second | Third         | Fourth        | Fifth         | Sixth         | Seventh    |
+-------+--------+---------------+---------------+---------------+---------------+------------+
| 0     | ABC    | 3033.90000000 | 3032.44440000 | 0.00000000    | 0.00000000    | 0.00000000 |
+-------+--------+---------------+---------------+---------------+---------------+------------+
| 1     | DEF    | 3033.25770000 | 3033.05300000 | 3032.08080000 | 0.00000000    | 0.00000000 |
+-------+--------+---------------+---------------+---------------+---------------+------------+
| 2     | JHI    | 3032.83760000 | 3033.25960000 | 3033.22590000 | 3033.32200000 | 0.00000000 |
+-------+--------+---------------+---------------+---------------+---------------+------------+
| 3     | XYZ    | 3032.83760000 | 3032.83760000 | 3032.83760000 | 3032.83760000 | 0.00000000 |
+-------+--------+---------------+---------------+---------------+---------------+------------+

Nice, I still tend to forget that OPENJSON can retain order, but I think it's the replace and concat gymnastics that make me forget.

Aaron Bertrand · Accepted Answer · 2020-06-10 21:05:32Z

Instead of passing a string from .NET like 'a|b|c^d|e|f' and then having to parse it, leave it in its original structure (DataTable?) and create a table type in SQL Server. Then you can pass in your structure instead of this cobbled-together string.

In SQL Server:

CREATE TYPE dbo.MyTableType AS TABLE
(
  ColumnA int,
  ColumnB nvarchar(32), 
  ...
);
GO

CREATE PROCEDURE dbo.ShowArray
  @DataTable dbo.MyTableType
AS
BEGIN
  SET NOCOUNT ON;

  SELECT ColumnA, ColumnB, ...
    FROM @DataTable;
END

In C# (untested and incomplete):

DataTable dt = new DataTable();
dt.Columns.Add("ColumnA", typeof(Int32));
dt.Columns.Add("ColumnB", typeof(String));
...

DataRow dr = dt.NewRow();  
dr[0] = 1;  
dr[1] = "foo";
...

dt.Rows.Add(dr);
...

  SqlCommand cmd = new SqlCommand("dbo.ShowArray", connectionObject);
  cmd.CommandType = CommandType.StoredProcedure;
  SqlParameter tvp1 = c2.Parameters.AddWithValue("@DataTable", dt);
  tvp1.SqlDbType = SqlDbType.Structured;
  ...

More on this shift away from splitting strings here and, actually, in this answer as well:

https://stackoverflow.com/a/11105413/61305

Gordon Linoff · Accepted Answer · 2020-06-10 01:47:28Z

You can use a recursive CTE:

declare @instr varchar(max) = '0|ABC|3033.9|3032.4444|0|0|0^1|DEF|3033.2577|3033.053|3032.0808|0|0^2|JHI|3032.8376|3033.2596|3033.2259|3033.322|0^3|XYZ|3032.8376|3032.8376|3032.8376|3032.8376|0'
;
with cte as (
      select row_number() over (order by (select null)) as id, convert(varchar(max), null) as el, Value + '|' as rest, 0 as lev
      from string_split(@InStr, '^')
      union all
      select id, left(rest, charindex('|', rest) - 1),
             stuff(rest, 1, charindex('|', rest), ''),
             lev + 1
      from cte
      where rest <> ''
     )
select max(case when lev = 1 then el end),
       max(case when lev = 2 then el end),
       max(case when lev = 3 then el end),
       max(case when lev = 4 then el end),
       max(case when lev = 5 then el end),
       max(case when lev = 6 then el end),
       max(case when lev = 7 then el end)
from cte
group by id;

Here is a db<>fiddle.

Unfortunately, you can't safely use string_split() because it does not provide the offset for the values returned.

Dale, I guess there isn't an elegant way to pass a 2 dimensional array through a stored procedure and store it?
@pju Where is your "2 dimensional array" coming from? Why not use a table-valued parameter and pass in a data structure (like a DataTable) from your application? Then no parsing or splitting or array simulation... it's already a table.
@pju "Coming from .Net" should have been in the question. Since you're using .Net, Aaron Bertrand's answer is the best one here.
@pju . . . This question is about string parsing (which this question answers). If what you really want to do is pass a table to a stored procedure -- well, you didn't ask that question. But Aaron's comment should point you in the right direction.

Roger Wolf · Accepted Answer · 2020-06-10 04:30:57Z

For a subsequent splitting of pipe-separated substrings you can utilise openjson(), as demonstrated in the example below:

declare @InStr varchar(max) = '0|ABC|3033.9|3032.4444|0|0|0^1|DEF|3033.2577|3033.053|3032.0808|0|0^2|JHI|3032.8376|3033.2596|3033.2259|3033.322|0^3|XYZ|3032.8376|3032.8376|3032.8376|3032.8376|0';

select p.*
from (
    select ss.value as [RowId], oj.[key] as [ColumnId], oj.value as [ColumnValue]
    from string_split(@InStr,'^') ss
        cross apply openjson('["' + replace(ss.value, '|', '","') + '"]', '$') oj
) q
pivot (
    min(q.ColumnValue)
    for q.[ColumnId] in ([0], [1], [2], [3], [4], [5], [6])
) p;

There are many caveats with this approach, however. The most prominent are:

You need SQL Server 2016 or later, and the database compatibility level needs to be 130 or above;
If your data is of any size worth mentioning (1Mb+), this code might work unacceptably slow. String manipulation is not the strong point of SQL Server.

Personally, I would recommend parsing this string outside of SQL. If it's a flat file you are importing, SSIS dataflow will be much easier to develop and faster to work. If it's an application, then redesign it to pass either a suitable table type, or XML / JSON blob at the very least.

Pointing to OPENJSON() as replacement for STRING_SPLIT() is a good thing, but there are two enhancements: 1) I'd use CONCAT() rather than the + operator to chain the strings, and 2) You can use doubled brackets ([[), which allows for a position and type-safe WITH-clause (no PIVOT needed). I've placed an answer here in this thread and on more detailed answer here
@Shnugo, I don't use concat() anymore, as I've witnessed a huge performance drop caused by this function once, compared to the standard string concatenation.

Venkataraman R · Accepted Answer · 2020-06-10 07:38:12Z

0

I am generating INSERT statement and then executing it. First I am splitting the string and then I am generating INSERT statement.

Note:

I am assuming that second column will be three letter code.
I am assuming that sort order of rows doesn't matter

declare @instr varchar(max) = '0|ABC|3033.9|3032.4444|0|0|0^1|DEF|3033.2577|3033.053|3032.0808|0|0^2|JHI|3032.8376|3033.2596|3033.2259|3033.322|0^3|XYZ|3032.8376|3032.8376|3032.8376|3032.8376|0'
;
declare @insertStmt VARCHAR(max) ='INSERT INTO TABLEName VALUES '+ CHAR(13) + CHAR(10);
SELECT @insertStmt +=  CONCAT('(',replace(stuff(stuff(value,3,0,''''),7,0,''''),'|',','),'),') 
from STRING_SPLIT(@instr,'^') 
SELECT @insertStmt = STUFF(@insertStmt,len(@insertStmt),1,'')
select @insertStmt
EXEC(@insertStmt)

INSERT INTO TABLEName VALUES 
(0,'ABC',3033.9,3032.4444,0,0,0),(1,'DEF',3033.2577,3033.053,3032.0808,0,0),(2,'JHI',3032.8376,3033.2596,3033.2259,3033.322,0),(3,'XYZ',3032.8376,3032.8376,3032.8376,3032.8376,0)

edited Jun 10, 2020 at 7:38

answered Jun 10, 2020 at 4:44

Venkataraman R

13.2k2 gold badges38 silver badges68 bronze badges

5 Comments

Gottfried Lesigang Over a year ago

Very dangerous! This might work in your system and pass all tests and still break in production. STRING_SPLIT() does not guarantee to return the items in the expected sort order... Furthermore, your stuffing into fix positions is quirky...

Gottfried Lesigang Over a year ago

... addition: Instead of STRING_SPLIT() we should use OPENJSON() (I've placed an answer here too and a more detailed one here).

Venkataraman R Over a year ago

@Shnugo, I am not splitting columns in STRING_SPLIT. I am splitting the rows. So, sort order is not a problem here. Another thing is, as I have mentioned, I am assuming that second column is 3 character code, from the test data, OP has put. If it is > 3 characters, we have to use PATINDEX or some other approach.

Gottfried Lesigang Over a year ago

We do not know, if 1) the rows sort order does not matter and 2) if this code is of 3 characters in any case. In general I'd never rely on this. 2 years later someone decides to use 4 characters and suddenly your application needs to be reworked in thousand places... So your last sentence brings it to the point: some other approach :-)

Venkataraman R Over a year ago

@Shnugo, I completely agree with you. I have added note about my assumptions.

Collectives™ on Stack Overflow

Using STRING_SPLIT in SQL for delimited values

5 Answers 5

1 Comment

Comments

5 Comments

2 Comments

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

5 Comments

2 Comments

5 Comments

Linked

Related