What is DeepInfra

What you can do

LLMs & Chat

OpenAI-compatible API for 100+ LLMs. Swap your base URL, keep your code.

Vision & OCR

Multimodal models for visual understanding and document text extraction.

Embeddings & Reranking

State-of-the-art embedding and reranker models for search and RAG.

Image & Video Generation

FLUX, Stable Diffusion, text-to-video, and more.

Speech

Speech recognition (Whisper) and text-to-speech models.

Deploy Private Models

Run your own fine-tuned LLM on A100 / H100 / H200 / B200 / B300 with autoscaling.

Why DeepInfra

Drop-in OpenAI replacement. Point your existing OpenAI SDK to https://api.deepinfra.com/v1/openai and your code works without changes. No migration required.

Best price for open-source models. DeepInfra consistently offers the lowest prices for open-source model inference. You only pay per token — no idle GPU time, no minimums, no seat fees. DeepInfra is also the provider with the most models on OpenRouter.

Always-fresh model catalog. DeepInfra is typically among the first providers to deploy a newly released model.

Private deployments for compliance and customization. Need to run your own fine-tuned weights, or require data isolation? Deploy a dedicated instance on A100/H100/H200/B200/B300 with autoscaling and a private endpoint — competitive GPU pricing, deployable in just a few clicks.

GPU Clusters for training and full control. Rent a B200 or B300 cluster with SSH access and run whatever you want.

Quick example

from openai import OpenAI

client = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Get your API key from the Dashboard.

Getting Started

Chat Completions

More APIs

Deploy Private Models

GPU Instances

Integrations

Account & Security

Tutorials

What you can do

LLMs & Chat

Vision & OCR

Embeddings & Reranking

Image & Video Generation

Speech

Deploy Private Models

Why DeepInfra

Get started in 60 seconds

Quickstart

Quick example

Getting Started

Chat Completions

More APIs

Deploy Private Models

GPU Instances

Integrations

Account & Security

Tutorials

Documentation Index

​What you can do

LLMs & Chat

Vision & OCR

Embeddings & Reranking

Image & Video Generation

Speech

Deploy Private Models

​Why DeepInfra

​Get started in 60 seconds

Quickstart

​Quick example

What you can do

Why DeepInfra

Get started in 60 seconds

Quick example