Select the IBM® Granite®, open-source or third-party model best suited for your business and deploy on-prem or in the cloud.
Choose the model that best fits your specific use case, budget considerations, regional interests and risk profile.
Tailored for business, IBM Granite family of open, performant and trusted models deliver exceptional performance at a competitive price, without compromising safety.
Llama models are open, efficient large language models designed for versatility and strong performance across a wide range of natural language tasks.
Mistral models are fast, performant, open-weight language models designed for modularity and optimized for text generation, reasoning and multilingual applications.
There are several foundation models from other providers available on watsonx.ai.
What happens when you train a powerful AI model with your own unique data? Better customer experiences and faster value with AI. Explore these stories and see how.
Wimbledon used watsonx.ai foundation models to train its AI to create tennis commentary.
The Recording Academy used AI Stories with IBM watsonx to generate and scale editorial content around GRAMMY nominees.
The Masters uses watsonx.ai to bring AI-powered hole insights combined with expert opinions to digital platforms.
AddAI.Life uses watsonx.ai to access selected open-source large language models to build higher quality virtual assistants.
granite-3-3-8b-instruct
IBM
Supports reasoning and planning, questions and answers (Q&A), fill-in-the-middle support, summarization, classification, generation, extraction, RAG and coding tasks.
128k
0.20
granite-3-2-8b-instruct
IBM
Supports reasoning and planning, Q&A, summarization, classification, generation, extraction, RAG and coding tasks.
128k
0.20
granite-vision-3-2-2b
IBM
Supports image-to-text use cases for chart, graphs and infographics analysis, and context Q&A.
16,384
0.10
granite-3-2b-instruct (v3.1)
IBM
Supports Q&A, summarization, classification, generation, extraction, RAG and coding tasks.
128k
0.10
granite-3-8b-instruct (v3.1)
IBM
Supports Q&A, summarization, classification, generation, extraction, RAG and coding tasks.
128k
0.20
granite-guardian-3-8b (v3.1)
IBM
Supports detection of HAP/ or PII, jailbreaking, bias, violence and other harmful content.
128k
0.20
granite-guardian-3-2b (v3.1)
IBM
Supports detection of HAP or PII, jailbreaking, bias, violence and other harmful content.
128k
0.10
granite-13b-instruct
IBM
Supports Q&A, summarization, classification, generation, extraction and RAG tasks.
8192
0.60
granite-8b-code-instruct
IBM
Task-specific model for code by generating, explaining and translating code from a natural language prompt.
128k
0.60
granite-20b-multilingual
IBM
Supports Q&A, summarization, classification, generation, extraction, translation and RAG tasks in French, German, Portuguese, Spanish and English.
8192
0.60
granite-34b-code-instruct
IBM
Task-specific model for code by generating, explaining and translating code from a natural language prompt.
8192
0.60
granite-20b-code-instruct
IBM
Task-specific model for code by generating, explaining and translating code from a natural language prompt.
8192
0.60
granite-3b-code-instruct
IBM
Task-specific model for code by generating, explaining and translating code from a natural language prompt.
128k
0.60
granite-8b-japanese
IBM
Supports Q&A, summarization, classification, generation, extraction, translation and RAG tasks in Japanese.
4096
0.60
*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.
llama-4-scout-17b-16e-instruct
Meta
Multimodal reasoning, long-context processing (10M tokens), code generation and analysis, multilingual operations (200 languages supported), STEM and logical reasoning.
128k
Free preview
llama-4-maverick-17b-128e-instruct-fp8
Meta
Multimodal reasoning, long-context processing (10M tokens), code generation and analysis, multilingual operations (200 languages supported), STEM and logical reasoning.
128k
Input: 0.35 / Output: 1.40
llama-3-3-70b-instruct
Meta
Supports Q&A, summarization, generation, coding, classification, extraction, translation and RAG tasks in English, German, French, Italian, Portuguese, Hindi, Spanish and Thai.
128k
0.71
llama-3-2-90b-vision-instruct
Meta
Supports image captioning, image-to-text transcription (OCR) including handwriting, data extraction and processing, context Q&A and object identification.
128k
2.00
llama-3-2-11b-vision-instruct
Meta
Supports image captioning, image-to-text transcription (OCR) including handwriting, data extraction and processing, context Q&A and object identification.
128k
0.35
llama-guard-3-11b-vision
Meta
Supports image filtering, HAP or PII detection and harmful content filtering.
128k
0.35
llama-3-2-1b-instruct
Meta
Supports Q&A, summarization, generation, coding, classification, extraction, translation and RAG tasks in English, German, French, Italian, Portuguese, Hindi, Spanish and Thai.
128k
0.10
llama-3-2-3b-instruct
Meta
Supports Q&A, summarization, generation, coding, classification, extraction, translation and RAG tasks in English, German, French, Italian, Portuguese, Hindi, Spanish and Thai.
128k
0.15
llama-3-405b-instruct
Meta
Supports Q&A, summarization, generation, coding, classification, extraction, translation and RAG tasks in English, German, French, Italian, Portuguese, Hindi, Spanish and Thai. |
128k
Input: 5.00 / Output: 16.00
llama-3-1-70b-instruct
Meta
Supports Q&A, summarization, generation, coding, classification, extraction, translation and RAG tasks in English, German, French, Italian, Portuguese, Hindi, Spanish and Thai.
128k
1.80
llama-3-1-8b-instruct
Meta
Supports Q&A, summarization, generation, coding, classification, extraction, translation and RAG tasks in English, German, French, Italian, Portuguese, Hindi, Spanish and Thai.
128k
0.60
llama-3-70b-instruct
Meta
Supports RAG, generation, summarization, classification, Q&A, extraction, translation and code generation tasks.
8192
1.80
codellama-34b-instruct
Meta
Task-specific model for code by generating and translating code from a natural language prompt.
16384
1.80
*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.
mistral-medium-2505
Mistral AI
Supports coding, image captioning, image-to-text transcription, function calling, data extraction and processing, context Q&A, mathematical reasoning
128k
Input: 3.00 / Output: 10.00
mistral-small-3-1-24b-instruct-2503
Mistral AI
Supports image captioning, image-to-text transcription, function calling, data extraction and processing, context Q&A and object identification
128k
Input: 0.10 / Output: 0.30
pixtral-12b
Mistral AI
Supports image captioning, image-to-text transcription (OCR) including handwriting, data extraction and processing, context Q&A and object identification.
128k
0.35
mistral-large-2
Mistral AI
Supports Q&A, summarization, generation, coding, classification, extraction, translation and RAG tasks in French, German, Italian, Spanish and English.
128k*
Input: 3.00 / Output: 10.00
Mistral-Small-24B-Instruct-2501
Mistral AI
Supports language tasks, agentic workflows, RAG and more in dozens of languages with a fast response time.
32768
0.35
mixtral-8x7b-instruct
Mistral AI
Supports Q&A, summarization, classification, generation, extraction, RAG and code generation tasks.
32768
0.60
*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.
allam-1-13b-instruct
SDAIA
Supports Q&A, summarization, classification, generation, extraction, RAG and translation in Arabic.
4096
1.80
jais-13b-chat (Arabic)
core42
Supports Q&A, summarization, classification, generation, extraction and translation in Arabic.
2048
1.80
flan-t5-xl-3b
Supports Q&A, summarization, classification, generation, extraction and RAG tasks. Available for prompt-tuning.
4096
0.60
flan-t5-xxl-11b
Supports Q&A, summarization, classification, generation, extraction and RAG tasks.
4096
1.80
flan-ul2-20b
Supports Q&A, summarization, classification, generation, extraction and RAG tasks.
4096
5.00
elyza-japanese-llama-2-7b-instruct
ELYZA
Supports Q&A, summarization, RAG, classification, generation, extraction and translation tasks.
4096
1.80
*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.
Use IBM developed and open-sourced embedding models, deployed in IBM watsonx.ai, for retrieval augmented generation, semantic search and document comparison tasks. Or choose a third-party embedding model provider.
granite-embedding-107m-multilingual
IBM
Retrieval augmented generation, semantic search and document comparison tasks.
512
0.10
granite-embedding-278m-multilingual
IBM
Retrieval augmented generation, semantic search and document comparison tasks.
512
0.10
slate-125m-english-rtrvr-v2
IBM
Retrieval augmented generation, semantic search and document comparison tasks.
512
0.10
slate-125m-english-rtrvr
IBM
Retrieval augmented generation, semantic search and document comparison tasks.
512
0.10
slate-30m-english-rtrvr-v2
IBM
Retrieval augmented generation, semantic search and document comparison tasks.
512
0.10
slate-30m-english-rtrvr
IBM
Retrieval augmented generation, semantic search and document comparison tasks.
512
0.10
*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.
all-mini-l6-v2
Microsoft
Retrieval augmented generation, semantic search and document comparison tasks.
256
0.10
all-minilm-l12-v2
OS-NLP-CV
Retrieval augmented generation, semantic search and document comparison tasks.
256
0.10
multilingual-e5-large
Intel
Retrieval augmented generation, semantic search and document comparison tasks.
512
0.10
*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.
IBM believes in the creation, deployment and utilization of AI models that advance innovation across the enterprise responsibly. IBM watsonx AI portfolio has an end-to-end process for building and testing foundation models and generative AI. For IBM-developed models, we search for and remove duplication, and we employ URL blocklists, filters for objectionable content and document quality, sentence splitting and tokenization techniques, all before model training.
During the data training process, we work to prevent misalignments in the model outputs and use supervised fine-tuning to enable better instruction following so that the model can be used to complete enterprise tasks through prompt engineering. We are continuing to develop the Granite models in several directions, including other modalities, industry-specific content and more data annotations for training, while also deploying regular, ongoing data protection safeguards for IBM developed-models.
Given the rapidly changing generative AI technology landscape, our end-to-end processes are expected to continuously evolve and improve. As a testament to the rigor IBM puts into the development and testing of its foundation models, the company provides its standard contractual intellectual property indemnification for IBM-developed models, similar to those it provides for IBM hardware and software products.
Moreover, contrary to some other providers of large language models and consistent with the IBM standard approach on indemnification, IBM does not require its customers to indemnify IBM for a customer’s use of IBM-developed models. Also, consistent with the IBM approach to its indemnification obligation, IBM does not cap its indemnification liability for the IBM-developed models.
The current watsonx models now under these protections include:
(1) Slate family of encoder-only models
(2) Granite family of a decoder-only model
*Supported context length by model provider, but actual context length on platform is limited. For more information, please see Documentation.
Inference is billed in Resource Units. 1 Resource Unit is 1,000 tokens. Input and completion tokens are charged at the same rate. 1,000 tokens are generally about 750 words.
Not all models are available in all regions. See our documentation for details.
Context length is expressed in tokens.
The IBM statements regarding its plans, directions and intent are subject to change or withdrawal without notice at its sole discretion. See Pricing for more details. Unless otherwise specified under Software pricing, all features, capabilities and potential updates refer exclusively to SaaS. IBM makes no representation that SaaS and software features and capabilities are the same.