DEV Community

Cover image for ELI5: Data Lake vs. Data Warehouse vs. Data Pond etc.
Peter Kim Frank
Peter Kim Frank Subscriber

Posted on

ELI5: Data Lake vs. Data Warehouse vs. Data Pond etc.

Cover image via Unsplash


These terms can be somewhat jargon-y and inter-mixed. How would you explain the nuance between a Data Lake vs. Data Warehouse vs. Data Pond vs. any other common terms in this arena to a 5 year old?

Top comments (3)

Collapse
 
gregory_booth_4301f5c2a68 profile image
Gregory Booth

Came across this today and saw nobody had replied with an answer. I'll do my best.

Similarities

All 3 are names for collections of data, but they vary is their capabilities and/or implementations.
All 3 can intake data from multiple sources.

Data Warehouse

A data warehouse is used for transformed and structured data. Data is structured according to a specific schema. Any new data added is transformed and structured to the schema when written (Schema-on-Write). Due to the structured nature of the data, queries are quick and returned data is reliable.

Data Lake

A data lake is used for raw data, which may be structured, semi-structured, non-structured, or binary items like images or video. The data is stored as is and is transformed and structured to a given schema when accessed (Schema-on-Read). This allows for more flexibility due to data not being hardset to a specific schema.

Marts and Ponds

Both of these also have a term for a smaller more concise repository. For example, a Data Warehouse will typically store data for an entire organization, while a Data Mart will only store data for a given function or dept. (finance, HR, etc.) Within the Lake paradigm, the smaller more refined dataset for a specific dept or function is a Data Pond.

TL;DR

Warehouse / Mart:

  • Fast
  • Reliable
  • Transformed to particular schema when written

Lake / Pond:

  • Cheap
  • Flexible
  • Transformed to particular schema when accessed

To complicate things even more, a newer term being thrown around is a Data Lakehouse, which combines elements of both.

Collapse
 
peter profile image
Peter Kim Frank

Thank you!

Collapse
 
mellen profile image
Matt Ellen-Tsivintzeli

Not an answer: Who ever came up with the metaphor of lake and pond for data storage is very bad at coming up with metaphors. You don't throw something into a lake to store it. "Our data lake is like lake Michigan!" "Dangerous in winter, with a surprising number of shipwrecks?" Data Pond is even worse. Not big enough to store anything useful is the first thing that comes to mind.

Anyway. I'll bookmark this to find our their real meaning!