In a blog post titled "Schema Later" Considered Harmful on EDB's site, Michael Stonebraker demonstrates that inserting junk data can be harmful to queries. While this conclusion is evident, itβs important to note that all databases have some form of schema and a full "schema later" doesn't exist. Otherwise, indexes couldn't be added, and data couldn't be queried and processed.
It's recommended to declare the schema-on-write part in the database in addition to the application code, once it has been established and used in the application.
A common mistake vendors make is comparing one database, where the author is an expert, to another database they are unfamiliar with and unwilling to learn about. This leads to biased conclusions, as they contrast best practices from one database with a database where the design was incorrect. In the EDB blog post, the schema for PostgreSQL was defined using data types and check constraints. However, this work was left out in the example for MongoDB.
In MongoDB, you can begin with a flexible schema defined by your application. Once the structure is established, MongoDB schema validation ensures that there are no unintended changes or improper data types, maintaining the integrity of your data. Although this feature has been available since MongoDB 3.6, released in 2017, it remains overlooked due to persistent myths about NoSQL and unstructured data.
In the EDB blog, they created the PostgreSQL table as:
create table employee (
name varchar,
age int4,
salary int4 check (salary > 0)
);
To compare, they should have created the MongoDB collection as:
db.createCollection("employee", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["name", "age", "salary"],
properties: {
name: {Β bsonType: "string",Β description: "VARCHAR equivalent"Β },Β
age:Β {Β bsonType: "int", description: "INT4 equivalent"Β },Β
salary: {Β bsonType: "int",Β minimum: 0,Β description: "CHECK salary > 0 equivalent"Β
}
}
}
},
validationAction: "error" // Strict validation: reject invalid documents
});
With such schema validation, the incorrect inserts are rejected:
db.employee.insertOne ({name : "Stonebraker", age : 45, salary : -99})
MongoServerError: Document failed validation
Additional information: {
failingDocumentId: ObjectId('6845cfe3f9e37e21a1d4b0c8'),
...
propertyName: 'salary',
description: 'CHECK salary > 0 equivalent',
details: [
{
operatorName: 'minimum',
specifiedAs: { minimum: 0 },
reason: 'comparison failed',
consideredValue: -99
...
db.employee.insertOne ({name : "Codd", age : "old", salary : 40000})
MongoServerError: Document failed validation
Additional information: {
failingDocumentId: ObjectId('6845d041f9e37e21a1d4b0c9'),
...
propertyName: 'age',
description: 'INT4 equivalent',
details: [
{
operatorName: 'bsonType',
specifiedAs: { bsonType: 'int' },
reason: 'type did not match',
consideredValue: 'old',
consideredType: 'string'
}
...
The application receives all information regarding the violation, in JSON that is parsable by the exception handling. Unlike many SQL databases that provide only the constraint name in a text message, this approach avoids exposing internal names to the application, enhancing logical data independence.
Remark: instead of storing age, itβs advisable to store the date of birth as a date type ({ bsonType: 'date' }
with an acceptable range of values). Additionally, you can use sub-objects to include a currency alongside the salary: { salary: { amount: 40000, currency: "CHF" } }
.
MongoDB schema validation is declarative, and Atlas UI or Compass can help you start with an existing collection by populating rules from sample data:
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments. Some comments have been hidden by the post's author - find out more