NoSQL databases (commonly referred to as "not only SQL") represent a non-tabular paradigm of database design, storing and retrieving data differently than traditional relational tables. Engineered for flexible schemas, they scale out dynamically to support massive amounts of data and high user loads.
Unlike traditional RDBMS databases that enforce rigid tabular schemas, NoSQL databases come in a variety of types tailored to their data models. The main types include Document, Key-Value, Wide-Column, and Graph.
They are schema free. You do not need to define structures, attributes, or rules prior to inserting records.
Data structures used are not tabular. They are more flexible, with the capability to adjust dynamically to new data formats.
Built ground-up to handle huge amounts of concurrent read/write throughput (Big Data).
Most NoSQL systems are open source and have native support for horizontal scaling across commodity servers.
It just stores data in some format other than relational, such as key-value pairs, nested documents, or graphs.
The emergence of NoSQL was not arbitrary; it was a structural response to changing storage economics, increasing complexity of web scale applications, and the rise of utility cloud computing.
NoSQL databases emerged in the late 2000s as the cost of storage dramatically decreased. Gone were the days of needing to create a complex, difficult-to-manage data model in order to avoid data duplication. Developers (rather than storage) were becoming the primary cost of software development, so NoSQL databases optimised for developer productivity.
Data was becoming unstructured, making the process of structuring (defining schema in advance) extremely costly. NoSQL databases allowed developers to store huge amounts of unstructured data, giving them a lot of flexibility.
Recognising the need to rapidly adapt to changing requirements in a software system. Developers needed the ability to iterate quickly and make changes throughout their software stack — all the way down to the database. NoSQL databases gave them this flexibility.
Cloud computing rose in popularity, and developers began using public clouds to host their applications and data. They wanted the ability to distribute data across multiple servers and regions to make their applications resilient, to scale out instead of scale up, and to intelligently geo-place their data. Some NoSQL databases like MongoDB provide these capabilities natively.
NoSQL databases provide distinct, high-impact advantages over traditional relational systems in specific production environments:
RDBMS has pre-defined schema, which become an issue when we do not have all the data with us or we need to change the schema. It's a huge task to change schema on the go.
Horizontal scaling, also known as scale-out, refers to bringing on additional nodes to share the load. This is difficult with relational databases due to the difficulty in spreading out related data across nodes. With non-relational databases, this is made simpler since collections are self-contained and not coupled relationally. This allows them to be distributed across nodes more simply, as queries do not have to "join" them together across nodes.
Scaling horizontally is achieved through two main mechanisms:
Splitting a single logical database dataset horizontally across multiple physical servers (shards), where each server handles a fraction of the data.
Synchronized clusters of database servers running simultaneously. If a primary server crashes, replication provides instantaneous failover.
NoSQL databases are highly available due to its auto replication feature i.e. whenever any kind of failure happens data replicates itself to the preceding consistent state. If a server fails, we can access that data from another server as well, as in NoSQL database data is stored at multiple servers.
Queries in NoSQL databases can be faster than SQL databases. Why? Data in SQL databases is typically normalised, so queries for a single object or entity require you to join data from multiple tables. As your tables grow in size, the joins can become expensive. However, data in NoSQL databases is typically stored in a way that is optimised for queries.
"Data that is accessed together should be stored together."
Since related values are embedded directly in a single document block instead of spread across tables, queries typically do not require joins, making them exceptionally fast.
While inserts and reads are blistering fast, deleting or updating duplicated/denormalized data fields spread across multiple records can be difficult and costly.
Many NoSQL systems integrate native caching mechanisms directly. By keeping frequently accessed records or configuration sets in physical RAM based on keys, they reduce disk I/O and offer sub-millisecond responses.
NoSQL use case is more for Cloud applications, where applications run inside globally distributed, containerized, dynamically scalable cloud platforms.
Choosing NoSQL over an RDBMS depends entirely on the design goals of your system. You should choose NoSQL when you encounter the following scenarios:
When applications iterate quickly and require rapid alterations down to the database without running complex schema migrations.
Ideal for heterogeneous data shapes (JSON files, variable attributes) where enforcing a strict tabular shape causes high data-loss or engineering complexity.
When data grows exponentially, exceeding the limits of single-server disk/memory capacities, requiring seamless distributed storage.
When your primary scaling strategy is adding commodity servers (horizontal scale-out) rather than buying expensive single-host database servers.
Highly suited for distributed architectures like micro-services, real-time message streams, event triggers, and high-frequency IoT logging.
"Relationship data is best suited *only* for relational databases."
"NoSQL databases don't support ACID transactions."
There are four distinct models under the NoSQL umbrella, each optimized for different storage paradigms, access patterns, and query complexity:
The simplest type of NoSQL database is a key-value store. Every data element in the database is stored as a key-value pair consisting of an attribute name (or "key") and a value. In a sense, a key-value store is like a relational database with only two columns: the key or attribute name (such as "state") and the value (such as "Alaska").
A key-value database associates a value (which can be anything from a number or simple string to a complex object) with a key, which is used to keep track of the object. In its simplest form, a key-value store is like a dictionary / array / map object as it exists in most programming paradigms, but which is stored in a persistent way and managed by a Database Management System (DBMS).
🚀 Key-value databases use compact, efficient index structures to be able to quickly and reliably locate a value by its key, making them ideal for systems that need to be able to find and retrieve data in constant time.
The data is stored such that each row of a column will be next to other rows from that same column. While a relational database stores data in rows and reads data row by row, a column store is organised as a set of columns.
This columnar organization means that when you want to run analytics on a small number of columns, you can read those columns directly without consuming memory with the unwanted data.
Columns are often of the same type and benefit from more efficient compression algorithms, making reads even faster. Columnar databases can quickly aggregate the value of a given column (e.g. adding up total sales for a year in milliseconds).
This DB store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or nested objects.
A graph database focuses on the relationship between data elements. Each element is stored as a node (such as a person in a social media graph). The connections between elements are called links or relationships. In a graph database, connections are first-class elements of the database, stored directly. In relational databases, links are implied, using data to express the relationships.
A graph database is optimised to capture and search the connections between data elements, overcoming the massive JOIN operations overhead associated with joining multiple tables in SQL.
⚠️ Note: Very few real-world business systems can survive solely on graph queries. As a result graph databases are usually run alongside other more traditional databases.
Despite their scalability and flexible schema benefits, NoSQL databases carry significant technological trade-offs and disadvantages that engineers must account for:
Since data models in NoSQL databases are typically optimised for queries and not for reducing data duplication, NoSQL databases can be larger than SQL databases. Storage is currently so cheap that most consider this a minor drawback, and some NoSQL databases also support compression to reduce the storage footprint.
Because NoSQL databases duplicate and denormalize data to enable join-free, high-speed read queries, updating or deleting a specific data attribute requires updating it across multiple files/records. This results in heavy processing costs and synchronization lags.
Depending on the NoSQL database type you select, you may not be able to achieve all of your use cases in a single database. For example, graph databases are excellent for analysing relationships in your data but may not provide what you need for everyday retrieval of the data such as range queries. When selecting a NoSQL database, consider what your use cases will be and if a general purpose database like MongoDB would be a better option.
While some advanced databases support ACID transactions, NoSQL databases do not support ACID properties in general across all classes and models. Systems often adopt the BASE model (Basically Available, Soft State, Eventual Consistency), sacrificing absolute state consistency for uptime and speed.
NoSQL systems do not support data entry with consistency constraints. The lack of engine-enforced constraints (such as strict foreign keys, domain bounds, and check limits) shifts the responsibility of protecting data integrity fully onto the application code layer.
| Feature | SQL Databases | NoSQL Databases |
|---|---|---|
| Data Storage Model | Tables with fixed rows and columns | Document: JSON documents, Key-value: key-value pairs, Wide-column: tables with rows and dynamic columns, Graph: nodes and edges |
| Development History | Developed in the 1970s with a focus on reducing data duplication | Developed in the late 2000s with a focus on scaling and allowing for rapid application change driven by agile and DevOps practices. |
| Examples | Oracle, MySQL, Microsoft SQL Server, and PostgreSQL | Document: MongoDB and CouchDB, Key-value: Redis and DynamoDB, Wide-column: Cassandra and HBase, Graph: Neo4j and Amazon Neptune |
| Primary Purpose | General Purpose | Document: general purpose, Key-value: large amounts of data with simple lookup queries, Wide-column: large amounts of data with predictable query patterns, Graph: analyzing and traversing relationships between connected data |
| Schemas | Fixed | Flexible |
| Scaling | Vertical (Scale-up) | Horizontal (scale-out across commodity servers) |
| ACID Properties | Supported | Not Supported, except in DB like MongoDB etc. |
| JOINS | Typically Required | Typically not required |
| Data to object mapping | Required object-relational mapping | Many do not require ORMs. MongoDB documents map directly to data structures in most popular programming languages. |