New OLTP: Postgres With Separate Compute and Storage

Is Databricks onto something by focusing on streaming data by treating the data like developers have treated code?

Jun 21st, 2025 9:00am by Alex Williams

Featued image for: New OLTP: Postgres With Separate Compute and Storage

Featured image by Bernd Dittrich from Unsplash.

Are online transaction processing (OLTP) databases stuck in the past?

Well, OLTP databases are tightly coupled, said Databricks co-founder Reynold Xin in his keynote earlier this month at the Databricks Data + AI Summit in San Francisco. Such databases are monolithic, combining compute and storage in big machines, which leads to various problems, including over-provisioning, scaling challenges, performance issues and a range of system complexities.

In the new Lakebase product from Databricks, compute and storage are separated. Lakebase is loosely coupled, which opens up opportunities to rethink transactional databases and the use of agentic AI to accomplish what traditional OLTP databases can’t.

“If you look at OLTP databases you’re running today, whether it’s commercials or proprietary systems like Oracle or open source databases like MySQL, Postgres, they look more or less the same as they were in the ’90s,” Xin said at the Databricks event.

Viewed as heavyweight infrastructure that requires manual intervention and maintenance, OLTP databases act clunky and prove difficult to scale.

So what’s a developer to do?

Is Databricks onto something by focusing on streaming data — and thereby making it better suited to AI and agent-based architectures, by treating the data like developers have treated code?

I discussed this topic last week with Sanjeev Mohan, an independent analyst at the Databricks conference. We talked about Lakebase and the way its approach enables scaling using Postgres. The new service is in part based upon technology developed by Neon, a company Databricks has acquired.

Thomas Gauvin, a Cloudflare developer, detailed on his personal blog how Neon uses Postgres to implement “a custom storage system (written in Rust) that intercepts calls to update pages in the block storage and stores these updated pages on a cloud object store instead of the computer’s disk. This decoupling enables independent scaling of compute and storage.”

The New Stack’s Susan Hall interviewed Nikita Shamgunov, cofounder and CEO of Neon, in early 2024. He said that, though commonly used in git repositories, branching was never a good fit for databases.

Branching is available with Neon, though it has come about through a lot of hard work, Shamgunov said, and has evolved from an infrastructure feature to a developer workflow tool.

“It takes a next-generation architecture, storage architecture, to enable branching, because the key feature of branching is copy-on-write,” he said. “That’s what git has. For example, when you create a branch, you’re basically moving a few pointers around. And that gives you an isolated, full copy of your data in a separate branch.”

Databricks sees how, with Neon, it can provide a technically superior way to achieve real-time data streaming, enabling the capability to also transform data in real time. Databricks can leverage its data intelligence platform, based on the Lakehouse architecture, to provide an end-to-end experience.

“Why are they getting into this business?” Mohan asked. “Owning the analytical data is not enough. The keys to the kingdom reside on the operational or the transactional side. The world’s most important data is in Salesforce, it’s in SAP, is in a bunch of other ERPs.”

Purpose-Built for AI

The traditional methods for integrating databases are complex and not suited to AI, Xin said. The challenge lies in integrating analytics and AI with transactional workloads.

Consider what developers would do when adding a feature to a code base, Xin said in his keynote address at the Data + AI Summit. They’d create a new branch of the codebase and make changes to the new branch. They’d use that branch to check bugs, perform testing and so on.

Xin said creating a new branch is an instant operation. What’s the equivalent for databases? You only clone your production databases. It might take days. How do you set up secure networking? How do you create ETL pipelines and log data from one to another?

Lakebase takes the concept of OLTP databases and turns it on its head.

“First and foremost, it’s based on open source Postgres,” Xin said. “And second, it built on a novel decoupled storage from compute architecture that actually enables the modern-day developer workflow.”

The Databricks explanation: Storage and compute use separate clusters. Systems may scale to concurrent users and larger data sizes. The storage formats are open. Parquet, for example, provides an API to tools and engines, including machine learning (ML) and Python/R libraries.

Elastic scaling enables thousands of workloads to go live at a low cost, utilizing low-cost Postgres instances.

“The separation of storage from compute architecture also has a copy-on-write capability built in, so that we can instantly branch off a database,” Xin said. “It takes less than a second to create a whole clone of the database, and that includes most of the data and the schema of the database.

“And because of the copy-on-writing capability, you don’t actually have to pay for extra storage unless you start making changes, and only the changes themselves will incur extra charge, because under the hood, they all share the same storage.”

Streaming Is Changing Enterprise Data Needs

Streaming is now a first-class citizen in the enterprise, Mohan told me. The separation of compute and storage makes a difference. We are approaching an era when applications will scale infinitely, both in terms of the number of instances and their scale-out capabilities. And that leads us to new questions about how we start to think about evaluation, observability and semantics.

Accuracy matters. Language is semantic by nature, meaning there is a need for more capabilities to evaluate the veracity of the AI’s output.

ADP may have the world’s best payroll data, Mohan said, but then that data has to be processed through ETL into an analytics solution like Databricks. Then comes the analytics and the data science work. The customer has to perform a significant amount of data engineering work and preparation.

Databricks and others, like Snowflake, don’t want to be on the receiving end of the data. They want to serve customers who, for instance, need reports and require them to be delivered quickly. It’s hard to do that with the systems they now have in place.

Take, for instance, Securities and Exchange Commission reporting, Mohan said. The customer has a dashboard or some reports. These customers need to know the business lineage of that data.

“I wanna know, where did this data originate?” Mohan said, “How was it transformed? How was it cataloged? How was it integrated with other pieces of data before I ran my dashboard on it?

“So now, if Databricks owns the entire life cycle of data from creation all the way to consumption, then they own the data. It never leaves Databricks’ ecosystem.”

A “Disaggregation of Storage and Compute”

So, how does that work?

The New Stack’s Frederic Lardinois wrote that Lakebase combines the familiarity and extensibility of Postgres, the scalability of a modern serverless architecture, a modern developer experience, with the unified data experience of Databricks’ Lakehouse, and the operational maturity of the company’s Data Intelligence Platform.

Mohan suggested that Databricks’ purchase of Neon offers Databricks an advantage. “What they’re saying is that using their Neon acquisition, they can now have whatever frontend application sitting on top,” he said. “In the future, that may be an agent; that’s the bet, but agents are new. So that agent is going to write the data into an open standard file format like Parquet with Iceberg or Delta or Hudi on top.”

Parquet, Iceberg, Delta Lake and Hudi are all Apache projects.

“And then you’ll have a compute engine,” Mohan said. “So it’s a complete disaggregation of storage and compute.”

Xin said the separation of storage and computing is critical in the age of agentic coding and AI. With AI agents, an enterprise will have thousands of AI agents, even millions.

“The AI agents are acting as their own individual engineers,” Xin said. “They’re doing experiments on your codebase, maybe adding new features. You might even have multiple AI agents adding new features, adding the same feature, and you have judges to determine which feature is the best to implement. Every AI agent can actually add their own code branch, but also their own databases, at virtually no cost for experimentation.”

The underlying storage layer also makes it super easy to synchronize data at very high throughput from one object store to another object store, Xin said — so from one data lake to another data lake, from Lakehouse to Lakebase.

In conclusion, I asked Mohan: “Where do you see us now?”

He replied, “One thing that’s starting to stick out a little bit is the need for evaluation.”

And this comes down to the semantics of AI — the language nuances — that will require a deeper evaluation.

“The biggest problem customers have is reliability,” Mohan said. “Can you trust these models? Are they going to be accurate? Semantics become really important.”

Alex Williams is founder and publisher of The New Stack. He's a longtime technology journalist who did stints at TechCrunch, SiliconAngle and what is now known as ReadWrite. Alex has been a journalist since the late 1980s, starting at the...