Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
-
Updated
Dec 13, 2023 - Python
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
A high-throughput and memory-efficient inference and serving engine for LLMs
Operating LLMs in production
SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
本项目旨在分享大模型相关技术原理以及实战经验。
RayLLM - LLMs on Ray
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
Finetune LLMs on K8s by using Runbooks
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
Open Source Framework to build, train and monetise cross LLM and high accuracy Prompt Packages powered by Micro LLMs
Deploy and Scale LLM-based applications
A collection of all available inference solutions for the LLMs
PeriFlow: the fastest serving engine for generative AI such as LLMs
Awesome-LLM-Productization: a curated list of tools/tricks/news/regulations about AI and Large Language Model (LLM) productization
LLM (Large Language Model) FineTuning
Ray and Anyscale for UC Berkeley AI Hackathon!
A self-hosted personal chatbot API with FastAPI. It allows you to interact with the Llama2 LLM (and other open-source LLMs) to have natural language conversations, generate text, and perform various language-related tasks.
Add a description, image, and links to the llm-serving topic page so that developers can more easily learn about it.
To associate your repository with the llm-serving topic, visit your repo's landing page and select "manage topics."