performance is very slow: about 300 inserts to DB per second.
There are two sections of code I have performance concerns about.
But first, it would really be worth your while to put
a bunch of transaction data into a million-line .CSV
and run this single-threaded using a single DB connection,
to verify that DB performance is what you think it is.
Do it with "small" and "large" BEGIN ... COMMIT transactions,
as discussed below.
connection pooling
using var connection = new NpgsqlConnection(yugabyteConnectionString);
await connection.OpenAsync();
It's not clear to me that this does what you hope it does.
We hope it pulls from a connection pool
and the await (almost always) immediately moves on
since it finds a connection that is already open.
My concern is that perhaps this always allocates
a new local ephemeral TCP port number,
does {SYN, SYN-ACK, ACK} 3-way handshake,
waits for the listening daemon to fork off a new worker,
and then the await completes.
It would be worth verifying, even with
simple tools like netstat -an.
Notice that this concern disappears completely
when doing the proposed single-thread benching,
as a single persistent connection would be used for that.
If you have N threads, I wonder if you'd like to
allocate an array that stores N persistent connections.
We happen to be holding a mutex (semaphore) upon entry,
but with the array that's not even relevant since
the i-th thread enjoys exclusive access to the i-th connection.
large transactions
await connection.ExecuteAsync("INSERT INTO Messages(ConversationId, Message, SentAt) VALUES(@ConversationId, @Message, @SentAt);",
I don't know what your "is user group member" fraction might be,
so having done a million SELECTs I'm not sure if we do nearly
a million INSERTs or just some small number of them.
Let's assume the fraction is 90%, so we usually do the INSERT.
I am concerned that perhaps you're effectively
sending the following to the database:
- BEGIN transaction
- INSERT
- COMMIT transaction
If so, then we would expect DB performance to plummet,
perhaps to as low as a mere 300 tx/sec.
Databases really want to see "large" transactions
if we hope to achieve high throughput.
The COMMIT says "persist this to disk and wait
for that I/O to complete, so we can survive powerfail".
(OTOH, sometimes we need tiny transactions for correctness.
It's not clear that your use case fits into that,
since you reported performance in terms of throughput
rather than 95th-percentile latency.)
Here's an easy test you can shoehorn into the current code.
For each input we currently do one SELECT and (roughly) one INSERT.
Suppose we added a 2nd or even a 3rd useless "busy work" INSERT to the mix.
Make a prediction about what reduced performance numbers you anticipate.
Then do the experiment to verify.
What I'm shooting for is that each HandleMessageAsync call would do
- BEGIN
- SELECT
- INSERT
- INSERT
- INSERT
- COMMIT
If my guess about "short transactions" is correct,
then you might consider adopting this approach:
- In addition to current threads, launch a single "writer" worker, which listens on a FIFO.
- Threads issue SELECTs to their heart's content. (Maybe without even holding the semaphore.)
- Rather than sending INSERTs directly, threads append such data to the FIFO.
- The writer thread bufers up K requests, perhaps ten or a hundred, and issues a "big" transaction of BEGIN / INSERT / INSERT ... / COMMIT.
If we're concerned about same userid showing up repeatedly
within a small time window, we can always finesse that with
a local cache of items recently sent & received from the database.