Summary¶
Configuration¶
API documentation for vLLM's configuration classes.
- vllm.config.ModelConfig
- vllm.config.CacheConfig
- vllm.config.TokenizerPoolConfig
- vllm.config.LoadConfig
- vllm.config.ParallelConfig
- vllm.config.SchedulerConfig
- vllm.config.DeviceConfig
- vllm.config.SpeculativeConfig
- vllm.config.LoRAConfig
- vllm.config.PromptAdapterConfig
- vllm.config.MultiModalConfig
- vllm.config.PoolerConfig
- vllm.config.DecodingConfig
- vllm.config.ObservabilityConfig
- vllm.config.KVTransferConfig
- vllm.config.CompilationConfig
- vllm.config.VllmConfig
Offline Inference¶
LLM Class.
LLM Inputs.
vLLM Engines¶
Engine classes for offline and online inference.
Inference Parameters¶
Inference parameters for vLLM APIs.
Multi-Modality¶
vLLM provides experimental support for multi-modal models through the vllm.multimodal package.
Multi-modal inputs can be passed alongside text and token prompts to supported models
via the multi_modal_data
field in vllm.inputs.PromptType.
Looking to add your own multi-modal model? Please follow the instructions listed here.
Inputs¶
User-facing inputs.
Internal data structures.
- vllm.multimodal.inputs.PlaceholderRange
- vllm.multimodal.inputs.NestedTensors
- vllm.multimodal.inputs.MultiModalFieldElem
- vllm.multimodal.inputs.MultiModalFieldConfig
- vllm.multimodal.inputs.MultiModalKwargsItem
- vllm.multimodal.inputs.MultiModalKwargs
- vllm.multimodal.inputs.MultiModalInputs