Gemma (language model)

Gemma is a series of source-available large language models developed by Google DeepMind. It is based on similar technologies as Gemini. The first version was released in February 2024, followed by Gemma 2 in June 2024, Gemma 3 in March 2025, and the free and open-source Gemma 4 in April 2026. Variants of Gemma have also been developed, such as the vision-language model PaliGemma and the model MedGemma for medical consultation topics.

Gemma
DeveloperGoogle DeepMind
Initial releaseFebruary 21, 2024; 2 years ago (2024-02-21)[1]
Stable release
Gemma 4 / April 2, 2026; 50 days ago (2026-04-02)[2]
TypeLarge language model
License
Websitedeepmind.google/models/gemma/

History

edit

In February 2024, Google debuted Gemma, a collection of source-available LLMs that serve as a lightweight version of Gemini. The initial release came in two sizes, neural networks with two and seven billion parameters, respectively. Multiple publications viewed this as a response to competitors such as Meta releasing source code for their AI models, and a shift from Google's longstanding practice of keeping its AI source code private.[4][5][6][7]

Gemma 2 was released on June 27, 2024,[8] and Gemma 3 was released on March 12, 2025.[9][10] On April 2, 2026, Google released Gemma 4 under the free and open-source Apache 2.0 license.[2][11]

Overview

edit

Based on similar technologies as the Gemini series of models, Gemma is described by Google as helping support its mission of "making AI helpful for everyone."[12] Google offers official Gemma variants optimized for specific use cases, such as MedGemma for medical analysis.[13]

Since its release, Gemma models have had over 150 million downloads, with 70,000 variants available on Hugging Face.[14]

Gemma 3 was offered in 1, 4, 12, and 27 billion parameter sizes with support for over 140 languages. As multimodal models, they support both text and image input.[15] Google also offers Gemma 3n, smaller models optimized for execution on consumer devices like phones, laptops, and tablets.[16]

The latest generation of models is Gemma 4, released on April 2, 2026. It is available in four sizes: Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts (MoE), and 31B Dense. Gemma 4 supports multimodal input, including images and video across all models, with native audio input on the E2B and E4B models.[17][18] Gemma 4's 31B Dense variant reached third place on Arena's text leaderboard, and the 26B variant reached sixth place.[18]

Architecture

edit

Gemma 3 is based on a decoder-only transformer architecture with grouped-query attention (GQA) and the SigLIP vision encoder. Every model has a context length of 128K, with the exception of Gemma 31B, which has a context length of 32K.[19]

Quantized versions fine-tuned using quantization-aware training (QAT) are also available,[19] offering sizable memory usage improvements with some negative impact on accuracy and precision.[20]

Variants

edit

Google develops official variants of Gemma models designed for specific purposes, like medical analysis or programming. These include:

  • ShieldGemma 2 (4B): Based on the Gemma 3 family, ShieldGemma is designed to identify and filter violent, dangerous, and sexually explicit images.[21]
  • MedGemma (4B and 27B): Also based on Gemma 3, MedGemma is designed for medical applications like image analysis. However, Google also notes that MedGemma "isn't yet clinical grade."[22] Developers at Tap Health in Gurgaon, India, have used MedGemma to enhance AI-assisted diabetes management applications.[23]
  • DolphinGemma (roughly 400M): Developed in collaboration with researchers at Georgia Tech and the Wild Dolphin Project, DolphinGemma aims to better understand dolphin communication through audio analysis. However, no model or data have been publicly released.[24][25]
  • CodeGemma (2B and 7B): CodeGemma is a group of models designed for code completion as well as general coding use.[26] It supports multiple programming languages, including Python, Java, C++, and more.[27]
Technical specifications of Gemma models
Generation Release date Parameters Context length Multimodal License Notes
Gemma 1 2024-02-21 2B, 7B 8,192 No Source-available (Gemma Terms of Use)[3] 2B distilled from 7B. 2B uses multi-query attention while 7B uses multi-head attention.
CodeGemma 2B, 7B 8,192 Gemma 1 finetuned for code generation.
RecurrentGemma 2024-04-11 2B, 9B Unlimited (trained on 8,192) Griffin-based, instead of Transformer-based.[28]
Gemma 2 2024-06-27 2B, 9B, 27B 8,192 27B trained from web documents, code, science articles. Gemma 2 9B was distilled from 27B. Gemma 2 2B was distilled from a 7B model that remained unreleased. Uses Grouped-Query Attention.[29]
PaliGemma 2024-07-10 3B 8,192 Image A vision-language model that takes text and image inputs, and outputs text. It is made by connecting a SigLIP-So400m image encoder with Gemma v1.0 2B.[30][31]
PaliGemma 2 2024-12-04 3B, 10B, 28B 8,192 Made by mating SigLIP-So400m with Gemma v2.0 2B, 9B, and 27B. Capable of more vision-language tasks.[32][33]
Gemma 3 2025-03-12 1B, 4B, 12B, 27B 131,072 All models trained with distillation. Post-training focuses on math, coding, chat, instruction following, and multilingual (supports 140 languages). Capable of function calling. 1B is not capable of vision.[34]
Gemma 4 2026-04-02 31B, 26B A4B, 4B, ~2B 128K (edge)
256K (larger)
Yes (vision, audio) Apache 2.0 The 31B and 26B A4B models are not capable of audio.[35]

Note: open-weight models can have their context length rescaled at inference time. With Gemma 1, Gemma 2, PaliGemma, and PaliGemma 2, the cost is a linear increase of kv-cache size relative to context window size. With Gemma 3 there is an improved growth curve due to the separation of local and global attention. With RecurrentGemma the memory use is unchanged after 2,048 tokens.

See also

edit

References

edit
  1. Banks, Jeanine; Warkentin, Tris (February 21, 2024). "Gemma: Introducing new state-of-the-art open models". The Keyword. Retrieved August 16, 2025.
  2. 1 2 "Gemma 4: Byte for byte, the most capable open models". Google. April 2, 2026. Retrieved April 2, 2026.
  3. 1 2 "Gemma Terms of Use". Google AI for Developers. April 1, 2026. Retrieved April 12, 2026.
  4. Khan, Jeremy (February 21, 2024). "Google unveils new family of open-source AI models called Gemma to take on Meta and others—deciding open-source AI ain't so bad after all". Fast Company. Archived from the original on February 21, 2024. Retrieved February 21, 2024.
  5. Alba, Davey (February 21, 2024). "Google Delves Deeper Into Open Source with Launch of Gemma AI Model". Bloomberg News. Archived from the original on February 21, 2024. Retrieved February 21, 2024.
  6. Metz, Cade; Grant, Nico (February 21, 2024). "Google Is Giving Away Some of the A.I. That Powers Chatbots". The New York Times. ISSN 0362-4331. Archived from the original on February 21, 2024. Retrieved February 21, 2024.
  7. Nieva, Richard (February 21, 2024). "Google's Latest AI Language Models Are Open Weight, Not Open Source". Forbes. Retrieved April 3, 2026.
  8. "Gemma 2 is now available to researchers and developers". Google. June 27, 2024. Retrieved August 15, 2024.
  9. "Introducing Gemma 3: The most capable model you can run on a single GPU or TPU". The Keyword. March 12, 2025.
  10. "Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM". Hugging Face. March 12, 2025.
  11. Whitwam, Ryan (April 2, 2026). "Google announces Gemma 4 open AI models, switches to Apache 2.0 license". Ars Technica. Retrieved April 3, 2026.
  12. Banks, Jeanine; Warkentin, Tris (February 21, 2024). "Gemma: Introducing new state-of-the-art open models". The Keyword. Retrieved July 13, 2025.
  13. "Gemma - Google DeepMind". Google DeepMind. Retrieved July 13, 2025.
  14. Wiggers, Kyle (May 12, 2025). "Google's Gemma AI models surpass 150M downloads". TechCrunch. Retrieved July 13, 2025.
  15. Gosthipaty, Aritra; merve; Cuenca, Pedro; Srivastav, Vaibhav (March 12, 2025). "Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM". Hugging Face. Retrieved July 13, 2025.
  16. "Gemma 3n model overview". Google AI for Developers. Retrieved July 13, 2025.
  17. "Gemma 4: Byte for byte, the most capable open models". Google. April 2, 2026. Retrieved April 2, 2026.
  18. 1 2 Bonifacic, Igor (April 3, 2026). "Google releases Gemma 4, a family of open models built off of Gemini 3". Engadget. Retrieved April 5, 2026.
  19. 1 2 Gemma Team (2025). "Gemma 3 Technical Report". arXiv:2503.19786v1 [cs.CL].
  20. Clark, Bryan (May 15, 2025). "What is quantization aware training?". IBM. Retrieved July 14, 2025.
  21. ShieldGemma Team (2025). "ShieldGemma 2: Robust and Tractable Image Content Moderation". arXiv:2504.01081 [cs.CV].
  22. "MedGemma". Google Health AI Developer Foundations. Retrieved July 15, 2025.
  23. "MedGemma: Our most capable open models for health AI development". Google Research Blog. October 28, 2025. Retrieved November 4, 2025. Developers at Tap Health in Gurgaon, India, remarked on MedGemma's superior medical grounding and potential for improving AI-assisted diabetes management.
  24. "DolphinGemma: How Google AI is helping decode dolphin communication". Georgia Tech. Retrieved July 15, 2025.
  25. Herzing, Denise; Starner, Thad (April 14, 2025). "DolphinGemma: How Google AI is helping decode dolphin communication". The Keyword. Retrieved July 15, 2025.
  26. Irwin, Kate (April 10, 2024). "Google Launches Coding AIs That Could Rival Microsoft's GitHub Copilot". PCMag. Retrieved July 15, 2025.
  27. "CodeGemma". Google AI for Developers. Retrieved July 15, 2025.
  28. Botev, Aleksandar; et al. (2024). "RecurrentGemma: Moving Past Transformers for Efficient Open Language Models". arXiv:2404.07839v2 [cs.LG].
  29. Gemma Team; Riviere, Morgane; Pathak, Shreya; Sessa, Pier Giuseppe; Hardin, Cassidy; Bhupatiraju, Surya; Hussenot, Léonard; Mesnard, Thomas; Shahriari, Bobak (August 2, 2024), Gemma 2: Improving Open Language Models at a Practical Size, arXiv:2408.00118
  30. "PaLI: Scaling Language-Image Learning in 100+ Languages". research.google. Retrieved August 15, 2024.
  31. Beyer, Lucas; et al. (July 10, 2024). "PaliGemma: A versatile 3B VLM for transfer". arXiv:2407.07726v2 [cs.CV].
  32. "Introducing PaliGemma 2 mix: A vision-language model for multiple tasks- Google Developers Blog". developers.googleblog.com. Retrieved February 22, 2025.
  33. Steiner, Andreas; André Susano Pinto; Tschannen, Michael; Keysers, Daniel; Wang, Xiao; Bitton, Yonatan; Gritsenko, Alexey; Minderer, Matthias; Sherbondy, Anthony; Long, Shangbang; Qin, Siyang; Ingle, Reeve; Bugliarello, Emanuele; Kazemzadeh, Sahar; Mesnard, Thomas; Alabdulmohsin, Ibrahim; Beyer, Lucas; Zhai, Xiaohua (2024). "PaliGemma 2: A Family of Versatile VLMs for Transfer". arXiv:2412.03555v1 [cs.CV].
  34. Team, Gemma; et al. (2025). "Gemma 3 Technical Report". arXiv:2503.19786v1 [cs.CL].
  35. "google/gemma-4-31B-it · Hugging Face". huggingface.co. April 2, 2026. Retrieved April 3, 2026.
edit