I've noticed performance issue with my application after changing server and trying to increase number of worker threads for some db intensive tasks.
After some tests i've found that the problem is with reading data from dataReader. Executing simple query on 30 threads takes at least 15 times slower then on single thread. Using PerfView I found that most of time is wasted on BLOCKED_TIME.
For tests I use server with Ryzen Threadripper (32cores/64threads) with local instance of SqlServer. Same results on production server with similar specification.
I've tried run 30 instances of application - there was almost no difference in performance between 2-3 and 30 instances, so server performance is good enough to carry 30 parallel queries.
I've tried some changes in connection string like increase/decrese min/max pool size, disable pooling, change LCP to TCP - with no result.
class Program
{
static void Main(string[] args)
{
var ids = new List<Guid>() { ... }; //filled by database ids
var stats = new ConcurrentBag<long>();
//warmup
stats.Add(TestMethod());
Console.WriteLine(String.Format("|{0}|{1,5}ms|", "warmup", stats.Average()));
//start 1 to 30 threads (test on server with 32 cores / 64 threads)
for (int i = 1; i <= 30; i++)
{
stats = new ConcurrentBag<long>();
var tasks = Enumerable.Range(0, i).Select(idx =>
{
var id = ids[idx]; // separate ids to be sure we're not reading same records from disk
return Task.Run(() =>
{
for (int j = 0; j < 20; j++)
{
stats.Add(TestMethod(id));
}
});
}).ToArray();
Task.WaitAll(tasks);
Console.WriteLine(String.Format("|{0,2}|{1,5}ms|", i, (int)stats.Average()));
}
Console.WriteLine("End");
Console.ReadLine();
}
private static long TestMethod()
{
var records = new List<object[]>();
var sw = new Stopwatch();
using (var connection = new SqlConnection(ConnectionString))
{
connection.Open();
using (var transaction = connection.BeginTransaction())
using (var command = connection.CreateCommand())
{
command.Transaction = transaction;
command.CommandText = SqlQuery;
command.Parameters.Add(new SqlParameter("id", id));
// measure only dataReader time
sw.Start();
using (var dataReader = command.ExecuteReader())
{
// got ~2000 rows from query
while (dataReader.Read())
{
//read all data from row, test on Guid
var values = new object[6];
dataReader.GetValues(values);
records.Add(values);
}
}
sw.Stop();
}
}
return sw.ElapsedMilliseconds;
}
Is there any way to increase performance and make my app scalable with number of threads?
edit. Db structure and example query to reproduce:
/****** Object: Table [dbo].[Table_1] Script Date: 05.07.2019 14:08:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Table_1](
[Id] [uniqueidentifier] NOT NULL,
[Ref1] [uniqueidentifier] NULL,
[Field1] [uniqueidentifier] NULL,
[Field2] [uniqueidentifier] NULL,
CONSTRAINT [PK_Table_1] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Table [dbo].[Table_2] Script Date: 05.07.2019 14:08:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Table_2](
[Id] [uniqueidentifier] NOT NULL,
[Field1] [uniqueidentifier] NULL,
CONSTRAINT [PK_Table_2] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Index [IDX_Table_1_Ref1] Script Date: 05.07.2019 14:08:15 ******/
CREATE NONCLUSTERED INDEX [IDX_Table_1_Ref1] ON [dbo].[Table_1]
(
[Ref1] ASC
)
INCLUDE ( [Field1],
[Field2]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Table_1] WITH CHECK ADD CONSTRAINT [FK_Table_1_Table_2] FOREIGN KEY([Ref1])
REFERENCES [dbo].[Table_2] ([Id])
GO
ALTER TABLE [dbo].[Table_1] CHECK CONSTRAINT [FK_Table_1_Table_2]
GO
select
t2.id as Id,
t2.Field1 as Field1,
t1.Id as T1_Id,
t1.Ref1 as T1_T2,
t1.Field1 as T1_Field1,
t1.Field2 as T1_Field2
from dbo.Table_2 t2
join dbo.Table_1 t1 on t1.Ref1 = t2.Id
where t2.id = @id
There id now 30 records in T1 and 2000 * 30 records in T1, so each thread works on same dataset with 30records. Data filled with random newid().
edit2.
I also compared this solution in cases - 30 separate processes vs 1 process and 30 threads on Sql Server. 30 separate processes works fine - its like 150% of original execution time, not 1500%. Most differences - with 30 separate processes and single thread i got ~14 waiting tasks and 20k batch requests / sec, with single process and 30 threads i got > 30 waiting tasks (mostly on Network I/O) and 2k batch requests / sec.
Setting
"System.GC.Server": true
solved my issue, now it scales up to the maximum available threads on the server. Thanks for the help!