Categories
Language Models
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning - 22 January 2025
- DeepSeek-V3 Technical Report - 27 December 2024
- ModernBERT - Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference - 18 December 2024
- Tulu 3: Pushing Frontiers in Open Language Model Post-Training - 22 November 2024
- Gemma 2: Improving Open Language Models at a Practical Size - 31 July 2024
- The Llama 3 Herd of Models - 31 July 2024
- Apple Intelligence Foundation Language Models - 29 July 2024
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model - 07 May 2024
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models - 05 February 2024
- DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence - 25 January 2024
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models - 11 January 2024
- Mixtral of Experts - 08 January 2024
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism - 05 January 2024
- Mistral 7B - 10 October 2023
- Llama 2: Open Foundation and Fine-Tuned Chat Models - 18 July 2023
- LLaMA: Open and Efficient Foundation Language Models - 27 February 2023
Multimodal Learning
- Gemma 3 Technical Report - 25 March 2025
- Pixtral 12B - 09 October 2024
- Gemini: A Family of Highly Capable Multimodal Models - 19 December 2023
Retrieval Augmented Generation
- From Local to Global: A Graph RAG Approach to Query-Focused Summarization - 24 April 2024
- RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval - 31 January 2024