I build AI systems designed to operate under real-world constraints. My work centers on real-time inference, embedding and retrieval pipelines, and agentic systems built close to the execution layer, where latency, memory, and reliability matter.
I work end to end, from low-level model execution and interfaces to orchestration, streaming, and system behavior in production environments. Most projects here are systems in motion rather than polished demos.
I am particularly interested in decision-critical and automation-heavy domains, where correctness and failure modes are first-class concerns.
Engineering on AI runtimes, real-time inference, distributed systems, and financial systems infrastructure. Public field notes currently on inference economics. New York.



