Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.deepinfra.com/llms.txt

Use this file to discover all available pages before exploring further.

DeepInfra is an AI inference cloud that makes it simple to run the latest machine learning models at scale — LLMs, vision, embeddings, image generation, video generation, speech, and more.

What you can do

LLMs & Chat

OpenAI-compatible API for 100+ LLMs. Swap your base URL, keep your code.

Vision & OCR

Multimodal models for visual understanding and document text extraction.

Embeddings & Reranking

State-of-the-art embedding and reranker models for search and RAG.

Image & Video Generation

FLUX, Stable Diffusion, text-to-video, and more.

Speech

Speech recognition (Whisper) and text-to-speech models.

Deploy Private Models

Run your own fine-tuned LLM on A100 / H100 / H200 / B200 / B300 with autoscaling.

Why DeepInfra

Drop-in OpenAI replacement. Point your existing OpenAI SDK to https://api.deepinfra.com/v1/openai and your code works without changes. No migration required. Best price for open-source models. DeepInfra consistently offers the lowest prices for open-source model inference. You only pay per token — no idle GPU time, no minimums, no seat fees. DeepInfra is also the provider with the most models on OpenRouter. Always-fresh model catalog. DeepInfra is typically among the first providers to deploy a newly released model. Private deployments for compliance and customization. Need to run your own fine-tuned weights, or require data isolation? Deploy a dedicated instance on A100/H100/H200/B200/B300 with autoscaling and a private endpoint — competitive GPU pricing, deployable in just a few clicks. GPU Clusters for training and full control. Rent a B200 or B300 cluster with SSH access and run whatever you want.

Get started in 60 seconds

Quickstart

Make your first API call — no installation required.

Quick example

from openai import OpenAI

client = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
Get your API key from the Dashboard.