Baseten’s cover photo
Baseten

Baseten

Software Development

San Francisco, CA 27,736 followers

Own your inference.

About us

Inference is everything. Baseten is an AI infrastructure platform giving you the tooling, expertise, and hardware needed to bring great AI products to market - fast. Our proprietary Inference Stack utilizes the cutting-edge of performance research combined with highly performant and reliable infrastructure to give you out-of-the-box global availability with 99.99% of uptime.

Website
https://www.baseten.co/
Industry
Software Development
Company size
201-500 employees
Headquarters
San Francisco, CA
Type
Privately Held
Specialties
developer tools, software engineering, artificial intelligence, and machine learning

Products

Locations

Employees at Baseten

Updates

  • Baseten reposted this

    Today we're releasing TIM-Qwen3.6-27B on a new OpenAI and Anthropic compatible API. Last month I wrote that open models had finally caught up to frontier models on the work most people *actually* need AI to do. The bottleneck stopped being the model and started being the environment around it. This release is our next step to unlock open models with our co-designed runtime and post-training process, now delivered in an API format that developers already love. The newest iteration of our inference runtime, TIMRUN, compresses context on the fly without losing reasoning quality. On long-context agent workloads, that means 10x effective context window length, 3x concurrent throughput, and 49% lower latency compared to models using SGLang on the same GPU. If you have a project that uses the OpenAI or Claude SDKs, you can point it at our endpoint and try TIM-Qwen3.6-27B in a few minutes. Full post on this release linked below. (We're also excited to share this system with 250+ developers at our hackathon next week with Baseten, Cloudflare, and Wayfair as part of Boston TECH WEEK by a16z)

    • No alternative text description for this image
  • Baseten reposted this

    As we've witnessed how AI transformed the software industry over the last year, I'm convinced that every industry will be transformed by these tools. Science is going through this transformation right now! I'm excited to share my discussion with Mihir Trivedi about how Baseten is accelerating inference and AI adoption in Life Sciences through our partnership with Benchling. It's been awesome building with Mihir and the team to bring this to life! Read more about it on our blog. https://lnkd.in/gtYF4hJY

  • Baseten reposted this

    View organization page for Benchling

    63,757 followers

    Today we’re announcing Benchling Inference! Together with Baseten, we’re offering scalable, cost-effective inference built for scientific AI. Why? Scientific workloads don’t look like typical AI workloads. Demands come in bursts, with teams needing to run 100,000 predictions for a few hours before going quiet again for days. Most infrastructure wasn’t built for that kind of scale or flexibility. With Benchling Inference, powered by Baseten, R&D teams can: ✔️ Run scientific models without managing infrastructure ✔️ Scale workloads up or down in seconds ✔️ Access cost-effective compute, enabled by aggregating demand We’ve taken everything we’ve learned from running Baseten for Benchling’s Model Hub — the configurations, defaults, and integrations to make inference work out-of-the-box for biotech — so you don’t have to. 👉 Learn more about our end-to-end solution for scalable inference: https://lnkd.in/duuR9Fxx

    • No alternative text description for this image
  • View organization page for Baseten

    27,736 followers

    Biotech R&D is generating more scientific AI models than ever, from protein structure prediction to molecular docking to sequence analysis. But the infrastructure to run them hasn't kept up. Today we're announcing Benchling Inference, powered by Baseten. Together with Benchling we're delivering on-demand GPU capacity built for the bursty, high-stakes demands of scientific workloads. With Benchling Inference, scientists can: → Deploy models in seconds, not weeks → Keep proprietary models inside their VPC if needed → Benefit from economics that work even at small and mid-size biotech scale Benchling and Baseten decided to team up because we believe that research teams shouldn't have to manage HPC queues, negotiate cloud contracts, or become GPU experts to run frontier models on their own data. Six years of inference expertise are now available where science happens. Read more here: https://lnkd.in/gj2hpC78

    • No alternative text description for this image
  • Baseten reposted this

    Fast image generation has become critical for production AI products, where latency directly affects user experience, throughput, and cost. Proud to share we at Baseten reached a new milestone in optimizing image generation serving for two frontier models, FLUX.2-dev and Qwen-Image, on NVIDIA Blackwell and Hopper GPUs! 🚀 Key results: 1️⃣ FLUX.2-dev: * 2.3× faster on B200 * 1.9× faster on H100 2️⃣ Qwen-Image: * 1.57× faster on B200 with FP4 * 1.18× faster on B200 with FP8 * 1.08× faster on H100 with FP8 These gains come from deep-stack optimizations such as hardware-aware quantization, memory optimizations, optimized attention kernels, specialized element-wise kernels, and runtime-level serving improvements. The same approach can also support workloads for other image generation models, such as Qwen-Image-Layered and Flux.2-klein. Read the full post here: https://lnkd.in/eQCnwmsi

    • No alternative text description for this image
  • View organization page for Baseten

    27,736 followers

    We serve Qwen3-TTS on vLLM-Omni at $3 per 1M characters. That's 90% lower in cost than comparable closed-source TTS APIs. Our engineers optimized a single-replica serving stack to get there. Details on the optimized stack and cost per concurrent stream here.

    Over the past months we worked deeply with Qwen3-TTS and vLLM-Omni to unlock high-quality voice at 1/10th the cost of closed-source providers. What it took: eliminating voice cloning bottlenecks, balancing time-to-first-audio against high throughput, and integrating features standard in closed-source APIs like word-level timestamps, to name a few. Voice is quickly becoming one of the predominant ways we interact with frontier models and the economics decide what's actually buildable from voice agents to learning apps. Here's how we got there 👇

  • Baseten reposted this

    On highly specialized tasks, fine-tuned open-source LLMs present a major opportunity: lower latency, dramatically reduced costs, and in many cases performance that rivals or exceeds general-purpose closed-source models. At EliseAI, we handle millions of housing and healthcare conversations every month. The workflows are complex, operationally critical, and deeply domain-specific. To build the most accurate AI possible, we partnered with the research team at Baseten to train custom models using state-of-the-art techniques including iterative SFT and OPSD. The result: we achieved closed-source model–level accuracy with a 4B parameter model while reducing end-user latency by 80%. Learn more here: https://lnkd.in/eutMg3xr

    • No alternative text description for this image

Similar pages

Browse jobs

Funding

Baseten 6 total rounds

Last Round

Series D

US$ 150.0M

See more info on crunchbase