rpc : enable async operations #7915

rgerganov · 2024-06-13T08:00:39Z

Start a dedicated backend thread in the rpc-server and use message passing interface for submitting work to it. This will enable backend async operations and cross-server communication.

Self Reported Review Complexity:
- Review Complexity : Low
- Review Complexity : Medium
- Review Complexity : High
I have read the contributing guidelines

Start a dedicated backend thread in the rpc-server and use message passing interface for submitting work to it. This will enable backend async operations and cross-server communication.

slaren · 2024-06-16T20:03:18Z

I may be wrong, but I suspect that the async queue will need to be implemented in the client side instead.

rgerganov · 2024-06-17T07:54:28Z

If we want to copy tensors across RPC servers then we need to handle at least two connections on the server side -- one from the scheduler and one from another RPC server. I considered the following options for implementing this:

Using a single thread and async IO. I think this would be hard to implement in a cross-platform way without using 3rd party libraries.
Using multiple threads and blocking IO. My assumption is that backend implementations are not guaranteed to be thread-safe, so we need to add synchronization when access the backend from multiple threads.
Using a single thread for all backend ops and submitting work to it via thread-safe message queue. No synchronization needed as backend is confined to a single thread.

I think option 3 is bringing less complexity compared to option 2 so I opted for it but I am open to discussions.

I may be wrong, but I suspect that the async queue will need to be implemented in the client side instead.

Could you please elaborate?

slaren · 2024-06-17T16:48:13Z

I wouldn't say that the message queue doesn't require synchronization, it is still locking a mutex for every message. Whether that's more efficient than the other methods, I don't know, but it is probably not going to be the bottleneck regardless. Another option could be using select/poll, which is still a single thread with blocking I/O.

To implement the async interface of ggml-backend, my intuition is that it would be simpler to implement the queue on the client side, but I am not completely sure of that. I think it should be possible to create a generic adapter that sits on top of another backend and implements the asynchronous operations by running an asynchronous queue in a different thread. For APIs that support multi-device synchronization natively such as CUDA, it is still going to be more efficient to use the native implementation, but for other backends it should be possible to provide a generic implementation.

rgerganov · 2024-06-20T10:50:01Z

PR #8032 is based on this work, trying to make copying tensors across servers more efficient. However, I am observing performance degradation with TinyLlama and 2 CUDA servers running on localhost.

@slaren may be we should close this PR and continue the discussion on PR #8032?

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 13, 2024

rpc : enable async operations

b30565e

Start a dedicated backend thread in the rpc-server and use message passing interface for submitting work to it. This will enable backend async operations and cross-server communication.

rgerganov force-pushed the async branch from 6971b32 to b30565e Compare June 14, 2024 08:46

ggerganov approved these changes Jun 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rpc : enable async operations #7915

rpc : enable async operations #7915

Uh oh!

rgerganov commented Jun 13, 2024

slaren commented Jun 16, 2024

rgerganov commented Jun 17, 2024

slaren commented Jun 17, 2024

rgerganov commented Jun 20, 2024

rpc : enable async operations #7915

Are you sure you want to change the base?

rpc : enable async operations #7915

Uh oh!

Conversation

rgerganov commented Jun 13, 2024

slaren commented Jun 16, 2024

rgerganov commented Jun 17, 2024

slaren commented Jun 17, 2024

rgerganov commented Jun 20, 2024