Skip to main content
added 200 characters in body
Source Link
J_H
  • 42.3k
  • 3
  • 38
  • 157
  1. In addition to current threads, launch a single "writer" worker, which listens on a FIFO.
  2. Threads issue SELECTs to their heart's content. (Maybe without even holding the semaphore.)
  3. Rather than sending INSERTs directly, threads append such data to the FIFO.
  4. The writer thread bufers up K requests, perhaps ten or a hundred, and issues a "big" transaction of BEGIN / INSERT / INSERT ... / COMMIT. In the 1980's Oracle described this performance trick, applied across several unrelated users, as "group commit". Put e.g. a 30 msec timeout on it, so final messages won't get stalled when we go idle.
  5. Writer thread sends K wakeup notices, perhaps via K semaphores, letting those threads report "success" to their end users.
  1. In addition to current threads, launch a single "writer" worker, which listens on a FIFO.
  2. Threads issue SELECTs to their heart's content. (Maybe without even holding the semaphore.)
  3. Rather than sending INSERTs directly, threads append such data to the FIFO.
  4. The writer thread bufers up K requests, perhaps ten or a hundred, and issues a "big" transaction of BEGIN / INSERT / INSERT ... / COMMIT.
  1. In addition to current threads, launch a single "writer" worker, which listens on a FIFO.
  2. Threads issue SELECTs to their heart's content. (Maybe without even holding the semaphore.)
  3. Rather than sending INSERTs directly, threads append such data to the FIFO.
  4. The writer thread bufers up K requests, perhaps ten or a hundred, and issues a "big" transaction of BEGIN / INSERT / INSERT ... / COMMIT. In the 1980's Oracle described this performance trick, applied across several unrelated users, as "group commit". Put e.g. a 30 msec timeout on it, so final messages won't get stalled when we go idle.
  5. Writer thread sends K wakeup notices, perhaps via K semaphores, letting those threads report "success" to their end users.
Source Link
J_H
  • 42.3k
  • 3
  • 38
  • 157

performance is very slow: about 300 inserts to DB per second.

There are two sections of code I have performance concerns about.

But first, it would really be worth your while to put a bunch of transaction data into a million-line .CSV and run this single-threaded using a single DB connection, to verify that DB performance is what you think it is. Do it with "small" and "large" BEGIN ... COMMIT transactions, as discussed below.


connection pooling

    using var connection = new NpgsqlConnection(yugabyteConnectionString);
    await connection.OpenAsync();

It's not clear to me that this does what you hope it does.

We hope it pulls from a connection pool and the await (almost always) immediately moves on since it finds a connection that is already open.

My concern is that perhaps this always allocates a new local ephemeral TCP port number, does {SYN, SYN-ACK, ACK} 3-way handshake, waits for the listening daemon to fork off a new worker, and then the await completes. It would be worth verifying, even with simple tools like netstat -an. Notice that this concern disappears completely when doing the proposed single-thread benching, as a single persistent connection would be used for that.

If you have N threads, I wonder if you'd like to allocate an array that stores N persistent connections. We happen to be holding a mutex (semaphore) upon entry, but with the array that's not even relevant since the i-th thread enjoys exclusive access to the i-th connection.


large transactions

    await connection.ExecuteAsync("INSERT INTO Messages(ConversationId, Message, SentAt) VALUES(@ConversationId, @Message, @SentAt);",

I don't know what your "is user group member" fraction might be, so having done a million SELECTs I'm not sure if we do nearly a million INSERTs or just some small number of them. Let's assume the fraction is 90%, so we usually do the INSERT.

I am concerned that perhaps you're effectively sending the following to the database:

  • BEGIN transaction
  • INSERT
  • COMMIT transaction

If so, then we would expect DB performance to plummet, perhaps to as low as a mere 300 tx/sec. Databases really want to see "large" transactions if we hope to achieve high throughput. The COMMIT says "persist this to disk and wait for that I/O to complete, so we can survive powerfail". (OTOH, sometimes we need tiny transactions for correctness. It's not clear that your use case fits into that, since you reported performance in terms of throughput rather than 95th-percentile latency.)

Here's an easy test you can shoehorn into the current code. For each input we currently do one SELECT and (roughly) one INSERT. Suppose we added a 2nd or even a 3rd useless "busy work" INSERT to the mix. Make a prediction about what reduced performance numbers you anticipate. Then do the experiment to verify.

What I'm shooting for is that each HandleMessageAsync call would do

  • BEGIN
  • SELECT
  • INSERT
  • INSERT
  • INSERT
  • COMMIT

If my guess about "short transactions" is correct, then you might consider adopting this approach:

  1. In addition to current threads, launch a single "writer" worker, which listens on a FIFO.
  2. Threads issue SELECTs to their heart's content. (Maybe without even holding the semaphore.)
  3. Rather than sending INSERTs directly, threads append such data to the FIFO.
  4. The writer thread bufers up K requests, perhaps ten or a hundred, and issues a "big" transaction of BEGIN / INSERT / INSERT ... / COMMIT.

If we're concerned about same userid showing up repeatedly within a small time window, we can always finesse that with a local cache of items recently sent & received from the database.