Baseten

Software Development

San Francisco, CA 27,736 followers

Own your inference.

See jobs Follow

About us

Inference is everything. Baseten is an AI infrastructure platform giving you the tooling, expertise, and hardware needed to bring great AI products to market - fast. Our proprietary Inference Stack utilizes the cutting-edge of performance research combined with highly performant and reliable infrastructure to give you out-of-the-box global availability with 99.99% of uptime.

Website: https://www.baseten.co/
External link for Baseten
Industry: Software Development
Company size: 201-500 employees
Headquarters: San Francisco, CA
Type: Privately Held
Specialties: developer tools, software engineering, artificial intelligence, and machine learning

Products

Baseten

Machine Learning Software

At Baseten we provide all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently. Get started in minutes, and avoid getting tangled in complex deployment processes. You can deploy best-in-class open-source models and take advantage of optimized serving for your own models. We also utilize horizontally scalable services that take you from prototype to production, with light-speed inference on infra that autoscales with your traffic. Best in class doesn't mean breaking the bank. Run your models on the best infrastructure without running up costs by taking advantage of our scaled-to-zero feature.

Locations

Primary

San Francisco, CA, US

Get directions
New York, NY, US

Get directions

Employees at Baseten

See all employees

Updates

Baseten reposted this
Jack O'Brien
1d
Report this post
Today we're releasing TIM-Qwen3.6-27B on a new OpenAI and Anthropic compatible API. Last month I wrote that open models had finally caught up to frontier models on the work most people *actually* need AI to do. The bottleneck stopped being the model and started being the environment around it. This release is our next step to unlock open models with our co-designed runtime and post-training process, now delivered in an API format that developers already love. The newest iteration of our inference runtime, TIMRUN, compresses context on the fly without losing reasoning quality. On long-context agent workloads, that means 10x effective context window length, 3x concurrent throughput, and 49% lower latency compared to models using SGLang on the same GPU. If you have a project that uses the OpenAI or Claude SDKs, you can point it at our endpoint and try TIM-Qwen3.6-27B in a few minutes. Full post on this release linked below. (We're also excited to share this system with 250+ developers at our hackathon next week with Baseten, Cloudflare, and Wayfair as part of Boston TECH WEEK by a16z)
15 Comments

Like Comment Share
Baseten reposted this
Bola Malek
1d
Report this post
As we've witnessed how AI transformed the software industry over the last year, I'm convinced that every industry will be transformed by these tools. Science is going through this transformation right now! I'm excited to share my discussion with Mihir Trivedi about how Baseten is accelerating inference and AI adoption in Life Sciences through our partnership with Benchling. It's been awesome building with Mihir and the team to bring this to life! Read more about it on our blog. https://lnkd.in/gtYF4hJY

Like Comment Share
Baseten reposted this
Benchling

63,757 followers
2d
Report this post
Today we’re announcing Benchling Inference! Together with Baseten, we’re offering scalable, cost-effective inference built for scientific AI. Why? Scientific workloads don’t look like typical AI workloads. Demands come in bursts, with teams needing to run 100,000 predictions for a few hours before going quiet again for days. Most infrastructure wasn’t built for that kind of scale or flexibility. With Benchling Inference, powered by Baseten, R&D teams can: ✔️ Run scientific models without managing infrastructure ✔️ Scale workloads up or down in seconds ✔️ Access cost-effective compute, enabled by aggregating demand We’ve taken everything we’ve learned from running Baseten for Benchling’s Model Hub — the configurations, defaults, and integrations to make inference work out-of-the-box for biotech — so you don’t have to. 👉 Learn more about our end-to-end solution for scalable inference: https://lnkd.in/duuR9Fxx
1 Comment

Like Comment Share
Baseten

27,736 followers
2d
Report this post
Biotech R&D is generating more scientific AI models than ever, from protein structure prediction to molecular docking to sequence analysis. But the infrastructure to run them hasn't kept up. Today we're announcing Benchling Inference, powered by Baseten. Together with Benchling we're delivering on-demand GPU capacity built for the bursty, high-stakes demands of scientific workloads. With Benchling Inference, scientists can: → Deploy models in seconds, not weeks → Keep proprietary models inside their VPC if needed → Benefit from economics that work even at small and mid-size biotech scale Benchling and Baseten decided to team up because we believe that research teams shouldn't have to manage HPC queues, negotiate cloud contracts, or become GPU experts to run frontier models on their own data. Six years of inference expertise are now available where science happens. Read more here: https://lnkd.in/gj2hpC78
7 Comments

Like Comment Share
Baseten

27,736 followers
3d
Report this post
“Intensity plus joy — an aha moment for me at Baseten is that those two things are not on opposite sides of a spectrum. Those can be inextricably linked, and that is the best.” Our President Dannie Herzberg sat down with Conviction to chat GTM, culture, and hiring in an evolving AI market.

1 Comment

Like Comment Share
Baseten reposted this
Faraz Shahsavan
4d Edited
Report this post
Fast image generation has become critical for production AI products, where latency directly affects user experience, throughput, and cost. Proud to share we at Baseten reached a new milestone in optimizing image generation serving for two frontier models, FLUX.2-dev and Qwen-Image, on NVIDIA Blackwell and Hopper GPUs! 🚀 Key results: 1️⃣ FLUX.2-dev: * 2.3× faster on B200 * 1.9× faster on H100 2️⃣ Qwen-Image: * 1.57× faster on B200 with FP4 * 1.18× faster on B200 with FP8 * 1.08× faster on H100 with FP8 These gains come from deep-stack optimizations such as hardware-aware quantization, memory optimizations, optimized attention kernels, specialized element-wise kernels, and runtime-level serving improvements. The same approach can also support workloads for other image generation models, such as Qwen-Image-Layered and Flux.2-klein. Read the full post here: https://lnkd.in/eQCnwmsi
3 Comments

Like Comment Share
Baseten

27,736 followers
1w
Report this post
Last week, we launched Baseten Frontier Gateway. This week, Marylise sits down with Bola to talk about why.

Bola Malek
1w

Excited to share more about why we built the Baseten Frontier Gateway through this conversation with Marylise Tauzia! Feel free to listen to the whole thing on Youtube: https://lnkd.in/giR2XM3x

1 Comment

Like Comment Share
Baseten

27,736 followers
1w
Report this post
We serve Qwen3-TTS on vLLM-Omni at $3 per 1M characters. That's 90% lower in cost than comparable closed-source TTS APIs. Our engineers optimized a single-replica serving stack to get there. Details on the optimized stack and cost per concurrent stream here.

Ian Carrasco
1w

Over the past months we worked deeply with Qwen3-TTS and vLLM-Omni to unlock high-quality voice at 1/10th the cost of closed-source providers. What it took: eliminating voice cloning bottlenecks, balancing time-to-first-audio against high throughput, and integrating features standard in closed-source APIs like word-level timestamps, to name a few. Voice is quickly becoming one of the predominant ways we interact with frontier models and the economics decide what's actually buildable from voice agents to learning apps. Here's how we got there 👇

Cost-Efficient, High-Performance TTS with Qwen3-TTS Ian Carrasco on LinkedIn

Like Comment Share
Baseten reposted this
Mario Claudio Martone
1w
Report this post
On highly specialized tasks, fine-tuned open-source LLMs present a major opportunity: lower latency, dramatically reduced costs, and in many cases performance that rivals or exceeds general-purpose closed-source models. At EliseAI, we handle millions of housing and healthcare conversations every month. The workflows are complex, operationally critical, and deeply domain-specific. To build the most accurate AI possible, we partnered with the research team at Baseten to train custom models using state-of-the-art techniques including iterative SFT and OPSD. The result: we achieved closed-source model–level accuracy with a 4B parameter model while reducing end-user latency by 80%. Learn more here: https://lnkd.in/eutMg3xr
16 Comments

Like Comment Share

Browse jobs

Funding

Baseten 6 total rounds

Last Round

Series D Oct 5, 2025

US$ 150.0M

Investors

Bond + 8 Other investors

See more info on crunchbase

Baseten

Software Development

San Francisco, CA 27,736 followers

Own your inference.

About us

Products

Baseten

Machine Learning Software

Locations

Employees at Baseten

Jason Dupree

Marylise Tauzia

Dharmesh Thakker

Tarun Diwan

Updates

Join now to see what you are missing

Similar pages

Decagon

Fireworks AI

ElevenLabs

Harvey

Together AI

Arize AI

Sierra

Metronome

Parsed

Anthropic

Browse jobs

Engineer jobs

Machine Learning Engineer jobs

Scientist jobs

Software Engineer jobs

Developer jobs

Marketing Manager jobs

Manager jobs

Senior Software Engineer jobs

Intern jobs

Associate jobs

Analyst jobs

Human Resources Specialist jobs

Executive jobs

Full Stack Engineer jobs

Operational Specialist jobs

Junior Software Engineer jobs

Designer jobs

Human Resources Generalist jobs

Human Resources Manager jobs

Account Executive jobs

Funding