Forem

# llm

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
I built a version manager for llama.cpp using nothing but vibe coding.

I built a version manager for llama.cpp using nothing but vibe coding.

9
Comments
3 min read
Agent Series (3): Plan-and-Solve — Think First, Then Act
Cover image for Agent Series (3): Plan-and-Solve — Think First, Then Act

Agent Series (3): Plan-and-Solve — Think First, Then Act

Comments
10 min read
Reasoning happens before the response
Cover image for Reasoning happens before the response

Reasoning happens before the response

Comments
5 min read
When your AI CEO Lies about the Numbers
Cover image for When your AI CEO Lies about the Numbers

When your AI CEO Lies about the Numbers

Comments
5 min read
From Tokens to Attention: My First Real Mental Model of LLMs
Cover image for From Tokens to Attention: My First Real Mental Model of LLMs

From Tokens to Attention: My First Real Mental Model of LLMs

Comments
5 min read
Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Qwen 3.6 27B and 35B MTP vs Standard on 16GB GPU

Comments
8 min read
One Open Source Project per Day #74: ai-engineering-from-scratch - Build AI Full-stack Skills from Ground Up
Cover image for One Open Source Project per Day #74: ai-engineering-from-scratch - Build AI Full-stack Skills from Ground Up

One Open Source Project per Day #74: ai-engineering-from-scratch - Build AI Full-stack Skills from Ground Up

Comments
2 min read
What I learned building memory for Claude Code — measured against the popular alternative

What I learned building memory for Claude Code — measured against the popular alternative

Comments
8 min read
GGUF & Modelfile: The Power User's Guide to Local LLMs

GGUF & Modelfile: The Power User's Guide to Local LLMs

Comments
5 min read
Hardware Guide: What Do You Actually Need to Run Local LLMs?

Hardware Guide: What Do You Actually Need to Run Local LLMs?

Comments
7 min read
NVIDIA's Nemotron Diffusion: One Model, Three Generation Modes, 6 Faster
Cover image for NVIDIA's Nemotron Diffusion: One Model, Three Generation Modes, 6 Faster

NVIDIA's Nemotron Diffusion: One Model, Three Generation Modes, 6 Faster

Comments
3 min read
LangChain JsonOutputParser: Fix Malformed JSON from LLMs

LangChain JsonOutputParser: Fix Malformed JSON from LLMs

Comments
2 min read
Why Claude Code Sessions Diverge: A Mechanism Catalog

Why Claude Code Sessions Diverge: A Mechanism Catalog

Comments
3 min read
Building a cost-efficient LLM caching layer in Python

Building a cost-efficient LLM caching layer in Python

Comments
5 min read
Gemma4 Apex GGUF, Ollama Context Optimization, & Llama3 Benchmarks

Gemma4 Apex GGUF, Ollama Context Optimization, & Llama3 Benchmarks

Comments
3 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.