Revisions to Why use a database instead of just saving your data to disk?

Commonmark migration

Source Link

edited Jun 16, 2020 at 10:01

Community Bot

1

###TLDR

TLDR

###When to do what you did

When to do what you did

###Alternatives

Alternatives

###Down

Down

###Up

Up

###More powerful data stores

More powerful data stores

###More complex data manipulation engines

More complex data manipulation engines

###Outsource the work

Outsource the work

Copy edited (but what are "Google App Data Store" and "Amazon's ECE"? - are they related to Google App Engine and Amazon EC2, respectively?). Added some context.

Source Link

edit approved Feb 23, 2014 at 19:14

Peter Mortensen

1k
2
12
15

tl;dr ###TLDR

itIt sounds like you made an essentially valid, short term data-store technical decision for your application - you chose to write a custom data store management tool.

you'reYou're sitting on a continuum, with options to move in either direction.

inIn the long term, you'll likely (almost, but not 100% certainly) find yourself running into trouble, and may be better off to change to using existing data store solutions. thereThere are specific, very common, predictable, performanceperformance problems you will be forced to deal with, and you're better off using existing tools instead of rolling your own.

itIt sounds like you've written a (small) custom-purpose database, built into and directly used by your application. iI assume you're relying on an OS+fileOS and file system to manage the actual disk writing and reading, and treating the combination as a data-store.

when to do what you did ###When to do what you did

you'reYou're sitting at a sweet-spot for data storage. an OS+fileAn OS and file system data store is incredibly convenient, accessible, and cross-platform portable. theThe combination has been around for so long, that you're certain to be supported, and have your application run, on almost any standard deployment configuration.

it'sIt's also an easy combination to write code for - the apiAPI is fairly straight-forward and basic, and it takes relatively few lines of code to get it working.

generallyGenerally, it's ideal to do what you've done when:

prototypingPrototyping new ideas
buildingBuilding applications which are highly unlikely to need to scale, performance wise
constrainedConstrained by unusual circumstances, such as lack of resources for installing a database

alternatives ###Alternatives

you'reYou're on a continuum of options, and there are two 'directions' you can go from here, what iI think of as 'down' and 'up':

down ###Down

thisThis is the least likely option to apply, but it's here for completeness sake:

youYou can, if you want, go down, that is, bypass the OS+filesystemOS and filesystem altogether and really write and read directly from disk. thisThis choice is usually relevant only in cases where extreme efficiency is required - think, for example, of a minimal/tiny mp3MP3 player device, without enough ramRAM for a fully functional OS, or of something like the way back machineWayback Machine, which requires incredibly efficient mass data write operations (most data stores trade off slower writes for faster reads, since that's the overwhelmingly more common use case for almost all applications).

up ###Up

thereThere are several sub-categories here - these aren't exactly exclusive, though. Some tools span both, providing some functionality in each, some can completely switch from working in one mode to working in the other, and some can be layered on top of each other, providing different functionality to different parts of your application.

more powerful data stores ###More powerful data stores

youYou may find yourself needing to store higher and higher volumes of data, while still relying on your own application for managing the data manipulation complexity. aA whole range of key-value stores are available to you, with varying extents of support for related functions. "NoSQL"NoSQL tools fall into this category, as well as others.

thisThis is the obvious path to scale up on when the following describe your application:

itIt is unusually heavy read reliant
you're okYou're OK with trading off higher performance for lower (short term) consistency guarantees (many offer "eventual consistency").
isIs "directly" managing most of the data manipulation and lack of consistency (in practice, you'll probably end up using a third party tool at first, though eventually you'll bring this into your application or into a custom written intermediate layer).
you'reYou're looking to massively scale the amount of data you're storing and/or your ability to search through it, with "relatively simple" data manipulation requirements.

thereThere is some wiggle room here - you can force better read consistency, for slower reads. variousVarious tools and options provide data manipulation apis, indexing and other options, which may be more or less suited for easily writing your specific application. soSo if the above points almost completely describe your application, you might be "close enough" to work with a more powerful data store solution.

well knownWell-known examples: CouchDBCouchDB, MongoDBMongoDB, RedisRedis, cloud storage solutions like Microsoft's AzureAzure, Google App Data Store and Amazon's ECE.

more complex data manipulation engines ###More complex data manipulation engines

theThe "SQL" family of data storage application, as well as a range of others, are better described as data manipulation tools, than pure storage engines. theyThey provide a wide range of additional functionality, beyond storage of data, and often beyond what's available in the key-value store side of things. you'llYou'll want to take this path when:

youYou absolutely have to have read consistency, even if it means you'll take a performance hit.
you'reYou're looking to efficiently perform highly complex data manipulation - think of very complex JOIN and UPDATE operations, data cubesdata cubes and slicing, etc...
you're okYou're OK with trading off rigidity for performance (think forced, fixed data storage formats, such as tables, which cannot easily and/or efficiently be altered).
youYou have the resources to deal with an often times more complex set of tools and interfaces.

thisThis is the more "traditional" way of thinking of a database or data store, and has been around for much longer - so there is a lot that's available here, and there's often a lot of complexity to deal with. it'sIt's possible, though it takes some expertise and knowledge, and build simple solutions/avoid much of the complexity - you most likely will end up using third party-party tools and libraries to manage most of it for you, though.

wellWell known examples: MySQL are MySQL, Microsoft SQLSQL Server, Oracle's Database, DB2and DB2.

outsource the work ###Outsource the work

thereThere are several, modern, third party-party tools and libraries, which interpose themselves between your data storage tools and your application, to help you manage the complexity.

theyThey attempt to initially take away most or all of the work that goes into managing and manipulating data stores, and, ideally, allow you to make a smooth transition into complexity only when and if it is required. thisThis is an active area of entrepreneurship and research, with a few recent results that are immediately accessible and useable.

well knownWell-known examples: MVC are MVC tools (DjangoDjango, YiiYii), RailsRuby on Rails, datomic.and Datomic. It is hard to be fair here, as there are literally dozens of tools and libraries which act as wrappers around the apisAPIs of various data stores.

PS: if you prefer videos to text, you might want to watch some of Rich Hickey's DBdatabase related videos; he does a good job of elucidating most of the thinking that goes into choosing, designing and using a data store.

tl;dr

it sounds like you made an essentially valid, short term data-store technical decision for your application - you chose to write a custom data store management tool.

you're sitting on a continuum, with options to move in either direction.

in the long term, you'll likely (almost, but not 100% certainly) find yourself running into trouble, and may be better off to change to using existing data store solutions. there are specific, very common, predictable, performance problems you will be forced to deal with, and you're better off using existing tools instead of rolling your own

it sounds like you've written a (small) custom-purpose database, built into and directly used by your application. i assume you're relying on an OS+file system to manage the actual disk writing and reading, and treating the combination as a data-store.

when to do what you did

you're sitting at a sweet-spot for data storage. an OS+file system data store is incredibly convenient, accessible, and cross-platform portable. the combination has been around for so long, that you're certain to be supported, and have your application run, on almost any standard deployment configuration.

it's also an easy combination to write code for - the api is fairly straight-forward and basic, and it takes relatively few lines of code to get it working.

generally, it's ideal to do what you've done when:

prototyping new ideas
building applications which are highly unlikely to need to scale, performance wise
constrained by unusual circumstances, such as lack of resources for installing a database

alternatives

you're on a continuum of options, and there are two 'directions' you can go from here, what i think of as 'down' and 'up':

down

this is the least likely option to apply, but it's here for completeness sake:

you can, if you want, go down, that is, bypass the OS+filesystem altogether and really write and read directly from disk. this choice is usually relevant only in cases where extreme efficiency is required - think, for example, of a minimal/tiny mp3 player device, without enough ram for a fully functional OS, or of something like the way back machine, which requires incredibly efficient mass data write operations (most data stores trade off slower writes for faster reads, since that's the overwhelmingly more common use case for almost all applications)

up

there are several sub-categories here - these aren't exactly exclusive, though. Some tools span both, providing some functionality in each, some can completely switch from working in one mode to working in the other, and some can be layered on top of each other, providing different functionality to different parts of your application.

more powerful data stores

you may find yourself needing to store higher and higher volumes of data, while still relying on your own application for managing the data manipulation complexity. a whole range of key-value stores are available to you, with varying extents of support for related functions. "NoSQL" tools fall into this category, as well as others.

this is the obvious path to scale up on when the following describe your application:

it is unusually heavy read reliant
you're ok with trading off higher performance for lower (short term) consistency guarantees (many offer "eventual consistency")
is "directly" managing most of the data manipulation and lack of consistency (in practice, you'll probably end up using a third party tool at first, though eventually you'll bring this into your application or into a custom written intermediate layer)
you're looking to massively scale the amount of data you're storing and/or your ability to search through it, with "relatively simple" data manipulation requirements

there is some wiggle room here - you can force better read consistency, for slower reads. various tools and options provide data manipulation apis, indexing and other options, which may be more or less suited for easily writing your specific application. so if the above points almost completely describe your application, you might be "close enough" to work with a more powerful data store solution

well known examples: CouchDB, MongoDB, Redis, cloud storage solutions like Microsoft's Azure, Google App Data Store and Amazon's ECE

more complex data manipulation engines

the "SQL" family of data storage application, as well as a range of others, are better described as data manipulation tools, than pure storage engines. they provide a wide range of additional functionality, beyond storage of data, and often beyond what's available in the key-value store side of things. you'll want to take this path when:

you absolutely have to have read consistency, even if it means you'll take a performance hit
you're looking to efficiently perform highly complex data manipulation - think of very complex JOIN and UPDATE operations, data cubes and slicing, etc...
you're ok with trading off rigidity for performance (think forced, fixed data storage formats, such as tables, which cannot easily and/or efficiently be altered)
you have the resources to deal with an often times more complex set of tools and interfaces

this is the more "traditional" way of thinking of a database or data store, and has been around for much longer - so there is a lot that's available here, and there's often a lot of complexity to deal with. it's possible, though takes some expertise and knowledge, and build simple solutions/avoid much of the complexity - you most likely will end up using third party tools and libraries to manage most of it for you, though.

well known examples: MySQL, Microsoft SQL, Oracle's Database, DB2

outsource the work

there are several, modern, third party tools and libraries, which interpose themselves between your data storage tools and your application, to help you manage the complexity.

they attempt to initially take away most or all of the work that goes into managing and manipulating data stores, and, ideally, allow you to make a smooth transition into complexity only when and if it is required. this is an active area of entrepreneurship and research, with a few recent results that are immediately accessible and useable.

well known examples: MVC tools (Django, Yii), Rails, datomic.. hard to be fair here, there are literally dozens of tools and libraries which act as wrappers around the apis of various data stores.

PS: if you prefer videos to text, you might want to watch some of Rich Hickey's DB related videos; he does a good job of elucidating most of the thinking that goes into choosing, designing and using a data store

###TLDR

It sounds like you made an essentially valid, short term data-store technical decision for your application - you chose to write a custom data store management tool.

You're sitting on a continuum, with options to move in either direction.

In the long term, you'll likely (almost, but not 100% certainly) find yourself running into trouble, and may be better off to change to using existing data store solutions. There are specific, very common, predictable, performance problems you will be forced to deal with, and you're better off using existing tools instead of rolling your own.

It sounds like you've written a (small) custom-purpose database, built into and directly used by your application. I assume you're relying on an OS and file system to manage the actual disk writing and reading, and treating the combination as a data-store.

###When to do what you did

You're sitting at a sweet-spot for data storage. An OS and file system data store is incredibly convenient, accessible, and cross-platform portable. The combination has been around for so long, that you're certain to be supported, and have your application run, on almost any standard deployment configuration.

It's also an easy combination to write code for - the API is fairly straight-forward and basic, and it takes relatively few lines of code to get it working.

Generally, it's ideal to do what you've done when:

Prototyping new ideas
Building applications which are highly unlikely to need to scale, performance wise
Constrained by unusual circumstances, such as lack of resources for installing a database

###Alternatives

You're on a continuum of options, and there are two 'directions' you can go from here, what I think of as 'down' and 'up':

###Down

This is the least likely option to apply, but it's here for completeness sake:

You can, if you want, go down, that is, bypass the OS and filesystem altogether and really write and read directly from disk. This choice is usually relevant only in cases where extreme efficiency is required - think, for example, of a minimal/tiny MP3 player device, without enough RAM for a fully functional OS, or of something like the Wayback Machine, which requires incredibly efficient mass data write operations (most data stores trade off slower writes for faster reads, since that's the overwhelmingly more common use case for almost all applications).

###Up

There are several sub-categories here - these aren't exactly exclusive, though. Some tools span both, providing some functionality in each, some can completely switch from working in one mode to working in the other, and some can be layered on top of each other, providing different functionality to different parts of your application.

###More powerful data stores

You may find yourself needing to store higher and higher volumes of data, while still relying on your own application for managing the data manipulation complexity. A whole range of key-value stores are available to you, with varying extents of support for related functions. NoSQL tools fall into this category, as well as others.

This is the obvious path to scale up on when the following describe your application:

It is unusually heavy read reliant
You're OK with trading off higher performance for lower (short term) consistency guarantees (many offer "eventual consistency").
Is "directly" managing most of the data manipulation and lack of consistency (in practice, you'll probably end up using a third party tool at first, though eventually you'll bring this into your application or into a custom written intermediate layer).
You're looking to massively scale the amount of data you're storing and/or your ability to search through it, with "relatively simple" data manipulation requirements.

There is some wiggle room here - you can force better read consistency, for slower reads. Various tools and options provide data manipulation apis, indexing and other options, which may be more or less suited for easily writing your specific application. So if the above points almost completely describe your application, you might be "close enough" to work with a more powerful data store solution.

Well-known examples: CouchDB, MongoDB, Redis, cloud storage solutions like Microsoft's Azure, Google App Data Store and Amazon's ECE.

###More complex data manipulation engines

The "SQL" family of data storage application, as well as a range of others, are better described as data manipulation tools, than pure storage engines. They provide a wide range of additional functionality, beyond storage of data, and often beyond what's available in the key-value store side of things. You'll want to take this path when:

You absolutely have to have read consistency, even if it means you'll take a performance hit.
You're looking to efficiently perform highly complex data manipulation - think of very complex JOIN and UPDATE operations, data cubes and slicing, etc...
You're OK with trading off rigidity for performance (think forced, fixed data storage formats, such as tables, which cannot easily and/or efficiently be altered).
You have the resources to deal with an often times more complex set of tools and interfaces.

This is the more "traditional" way of thinking of a database or data store, and has been around for much longer - so there is a lot that's available here, and there's often a lot of complexity to deal with. It's possible, though it takes some expertise and knowledge, and build simple solutions/avoid much of the complexity - you most likely will end up using third-party tools and libraries to manage most of it for you, though.

Well known examples are MySQL, SQL Server, Oracle's Database, and DB2.

###Outsource the work

There are several, modern, third-party tools and libraries, which interpose themselves between your data storage tools and your application, to help you manage the complexity.

They attempt to initially take away most or all of the work that goes into managing and manipulating data stores, and, ideally, allow you to make a smooth transition into complexity only when and if it is required. This is an active area of entrepreneurship and research, with a few recent results that are immediately accessible and useable.

Well-known examples are MVC tools (Django, Yii), Ruby on Rails, and Datomic. It is hard to be fair here as there are literally dozens of tools and libraries which act as wrappers around the APIs of various data stores.

PS: if you prefer videos to text, you might want to watch some of Rich Hickey's database related videos; he does a good job of elucidating most of the thinking that goes into choosing, designing and using a data store.

minor formatting

Source Link

edited Sep 13, 2013 at 13:56

blueberryfields

13.4k
8
54
88

Source Link

answered Sep 4, 2013 at 15:33

blueberryfields

13.4k
8
54
88

Loading

Stack Exchange Network

Return to Answer

TLDR

When to do what you did

Alternatives

Down

Up

More powerful data stores

More complex data manipulation engines

Outsource the work

TLDR

When to do what you did

Alternatives

Down

Up

More powerful data stores

More complex data manipulation engines

Outsource the work