2

I don't exactly know how to phrase the question, but an example would work. So I have this table

Users

Id          Name
1           Tim
2           Jon
3           Matt

There is another table

Tags

TagId       TagName
1           Test
2           Other
3           Dummy
4           More

In a temp table I have structure like this

TmpUserTags

User            Tags
Tim             Test
Jon             Other, Test
Matt            Dummy, More, Other

So, what I need to do is from this temp table, insert record in table UserTags with corresponding Ids, for the above given example, the result would be

UserTags

User                TagId
1                   1
2                   2
2                   1
3                   3
3                   4
3                   2 

So, this is the end result I want, to be inserted in UserTags. But since for each row in TmpUserTags each user can have many tags, separated by comma, I don't know what would be the best way to do it. I can probably use a while loop (or rather a cursor) to loop through all the rows in TmpUserTags, and then, for each row, split the tags by comma, find their Id, and insert those in UserTags. But that doesn't seems to be the most optimized way. Can someone please suggest some better way of doing it?

2
  • Why is your TmpUserTags table structured like this ? On row for each tuple user / tag would be soooo easier... Commented Jun 26, 2014 at 10:11
  • @RaphaëlAlthaus I know, but it's not like that. It's exactly like the way I described, for one user multiple comma separated tags. Commented Jun 26, 2014 at 10:17

4 Answers 4

1

I think the simplest way would be to just join the tags column using LIKE:

CREATE TABLE #Users (ID INT, Name VARCHAR(4));
INSERT #Users (ID, Name)
VALUES (1, 'Tim'), (2, 'Jon'), (3, 'Matt');

CREATE TABLE #Tags (TagID INT, TagName VARCHAR(5));
INSERT #Tags (TagID, TagName)
VALUES (1, 'Test'), (2, 'Other'), (3, 'Dummy'), (4, 'More');

CREATE TABLE #TmpUserTags ([User] VARCHAR(4), Tags VARCHAR(100));
INSERT #tmpUserTags ([User], Tags)
VALUES ('Tim', 'Test'), ('Jon', 'Other,Test'), ('Matt', 'Dummy,More,Other');

SELECT  u.ID, t.TagID 
FROM    #TmpUserTags AS ut
        INNER JOIN #Users AS u
            ON u.Name = ut.[User]
        INNER JOIN #Tags AS t
            ON ',' + ut.Tags + ',' LIKE '%,' + t.TagName + ',%';

You could also go down the route of creating a split function to split your comma separated list into rows:

CREATE FUNCTION [dbo].[Split](@StringToSplit NVARCHAR(MAX), @Delimiter NCHAR(1))
RETURNS TABLE
AS
RETURN
(   
    SELECT  ID = ROW_NUMBER() OVER(ORDER BY n.Number),
            Position = Number,
            Value = SUBSTRING(@StringToSplit, Number, CHARINDEX(@Delimiter, @StringToSplit + @Delimiter, Number) - Number)
    FROM    (   SELECT  TOP (LEN(@StringToSplit) + 1) Number = ROW_NUMBER() OVER(ORDER BY a.object_id)
                FROM    sys.all_objects a
            ) n
    WHERE   SUBSTRING(@Delimiter + @StringToSplit + @Delimiter, n.Number, 1) = @Delimiter
);

Then you can use:

SELECT  u.ID, t.TagID 
FROM    #TmpUserTags AS ut
        CROSS APPLY dbo.Split(ut.tags, ',') AS s
        INNER JOIN #Users AS u
            ON u.Name = ut.[User]
        INNER JOIN #Tags AS t
            ON t.TagName = s.Value;
Sign up to request clarification or add additional context in comments.

3 Comments

Are you sure this like won't cause any problems with multiple tags?
Yes, I have, but I am just asking but I haven't used LIKE on a join before, seems like at some point later it might mess things up. Say, one tag is Test, and another tag is Tested won't LIKE cause problem in this case?
No, because the predicate adds commas to tag name '%,' + t.TagName + ',%' it limits it to exact matches, so as an example tags of "tag1,testing" have a comma added before and after to become ",tag1,testing," - so this won't match a tag of "test" because it is actually looking for ",test," - it will only match ",tag1," and ",testing,".
0

Here is XML-version of answer:

SELECT Users.Id as [User],Tags.TagId
FROM 
 (SELECT A.[User],  
     LTRIM(Split.a.value('.', 'VARCHAR(100)')) AS Tagname  
 FROM  (SELECT [User],  
         CAST ('<M>' + REPLACE(Tags, ',', '</M><M>') + '</M>' AS XML) AS String  
     FROM  TmpUserTags) AS A CROSS APPLY String.nodes ('/M') AS Split(a)) ut
LEFT JOIN Users ON Users.Name=ut.[User]
LEFT JOIN Tags ON Tags.TagName=ut.Tagname

No procedures, functions and CTEs.

[Update] If some performance issues are appeared, please read this nice article: http://beyondrelational.com/modules/2/blogs/114/posts/14617/delimited-string-tennis-anyone.aspx

Left join is used to show all rows from the table TmpUserTags even if other tables are w/o some necessary rows (E.g. new User 'Bob' with Tag 'Test2' that isn't described in table Tags)

Comments

0

To me the main question would be how you arrived at the temp table containing the comma delimited column. If it was an import from a file and all the data was comma delimited it would be easy enough to save the file as a csv which will save the user and each tag separately, then create a table in the your database containing the same number of columns as the file has, then bulk insert this table from the file.

drop table #TmpUserTags
GO
create table #TmpUserTags
(
[user] varchar(10),
tag1 varchar(10),
tag2 varchar(10),
tag3 varchar(10)
)
bulk insert #TmpUserTags from '<filepath>' with (fieldterminator=',')

Then union the data to create two columns which should be easy enough to reinterpret as ids.

SELECT [User],Tag1 FROM #TmpUserTags WHERE Tag1 IS NOT NULL
UNION ALL
SELECT [User],Tag2 FROM #TmpUserTags WHERE Tag2 IS NOT NULL
UNION ALL
SELECT [User],Tag3 FROM #TmpUserTags WHERE Tag3 IS NOT NULL
ORDER BY [User]

Of course all this might be conjecture but, like, how did you arrive at the table with the comma delimited values?

4 Comments

Nope, it's not fixed there there would only be 3 tags, so to create a table with 3 columns for tags is not gonna work.
It would be ok to create a table for as many tags as the incoming data has!
It's dynamic, not fixed size. Could be 2 could be 200
OK, just to see if I am on the right track, for example running this would establish the maximum number of tags you have in the delimited column: select max(len(tags) - len(replace(tags, ',', ''))) from #tmpusertags, does that seem useful?
0

Just another way to do it (Fiddle: http://sqlfiddle.com/#!3/7f48f/20):

WITH cteTagMatrix
AS
(
    SELECT  n.ID,
            CASE 
                WHEN CHARINDEX(',' + t.TagName + ',', REPLACE(',' + tut.Tags + ',', ' ', '')) <> 0 THEN t.TagID
                ELSE NULL
            END AS TagID
    FROM    Names n INNER JOIN TmpUserTags tut
            ON n.[Name] = tut.[User]
            CROSS JOIN Tags t
)

SELECT    *
FROM      cteTagMatrix
WHERE     TagID IS NOT NULL
ORDER BY  ID, TagID;)

EDIT: Oops, was a error with my comma logic. Fixed code and fiddle updated.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.