Lecture 15

NoSQL Databases

NoSQL databases (commonly referred to as "not only SQL") represent a non-tabular paradigm of database design, storing and retrieving data differently than traditional relational tables. Engineered for flexible schemas, they scale out dynamically to support massive amounts of data and high user loads.

Core Characteristics of NoSQL

Unlike traditional RDBMS databases that enforce rigid tabular schemas, NoSQL databases come in a variety of types tailored to their data models. The main types include Document, Key-Value, Wide-Column, and Graph.

Schema-Free Nature

They are schema free. You do not need to define structures, attributes, or rules prior to inserting records.

Flexible & Dynamic Structures

Data structures used are not tabular. They are more flexible, with the capability to adjust dynamically to new data formats.

Handles Big Data

Built ground-up to handle huge amounts of concurrent read/write throughput (Big Data).

Open Source & Scale-Out

Most NoSQL systems are open source and have native support for horizontal scaling across commodity servers.

Non-Relational Formats

It just stores data in some format other than relational, such as key-value pairs, nested documents, or graphs.

History & Evolution

The emergence of NoSQL was not arbitrary; it was a structural response to changing storage economics, increasing complexity of web scale applications, and the rise of utility cloud computing.

Late 2000s

Economics of Storage and Developer Productivity

NoSQL databases emerged in the late 2000s as the cost of storage dramatically decreased. Gone were the days of needing to create a complex, difficult-to-manage data model in order to avoid data duplication. Developers (rather than storage) were becoming the primary cost of software development, so NoSQL databases optimised for developer productivity.

Data Shift

Unstructured Data Boom

Data was becoming unstructured, making the process of structuring (defining schema in advance) extremely costly. NoSQL databases allowed developers to store huge amounts of unstructured data, giving them a lot of flexibility.

Development Speed

Agility and Rapid Iteration

Recognising the need to rapidly adapt to changing requirements in a software system. Developers needed the ability to iterate quickly and make changes throughout their software stack — all the way down to the database. NoSQL databases gave them this flexibility.

Cloud Era

Cloud Computing & Horizontal Scaling

Cloud computing rose in popularity, and developers began using public clouds to host their applications and data. They wanted the ability to distribute data across multiple servers and regions to make their applications resilient, to scale out instead of scale up, and to intelligently geo-place their data. Some NoSQL databases like MongoDB provide these capabilities natively.

NoSQL Databases Advantages

NoSQL databases provide distinct, high-impact advantages over traditional relational systems in specific production environments:

A. Flexible SchemaFlexible Schema Architecture

RDBMS has pre-defined schema, which become an issue when we do not have all the data with us or we need to change the schema. It's a huge task to change schema on the go.

B. Horizontal ScalingHorizontal Scaling (Scale-Out)

Horizontal scaling, also known as scale-out, refers to bringing on additional nodes to share the load. This is difficult with relational databases due to the difficulty in spreading out related data across nodes. With non-relational databases, this is made simpler since collections are self-contained and not coupled relationally. This allows them to be distributed across nodes more simply, as queries do not have to "join" them together across nodes.

Scaling horizontally is achieved through two main mechanisms:

Sharding

Splitting a single logical database dataset horizontally across multiple physical servers (shards), where each server handles a fraction of the data.

Replica-sets

Synchronized clusters of database servers running simultaneously. If a primary server crashes, replication provides instantaneous failover.

C. High AvailabilityHigh Availability & Auto Replication

NoSQL databases are highly available due to its auto replication feature i.e. whenever any kind of failure happens data replicates itself to the preceding consistent state. If a server fails, we can access that data from another server as well, as in NoSQL database data is stored at multiple servers.

D. Easy Insert & ReadFast Insert & Read Operations

Queries in NoSQL databases can be faster than SQL databases. Why? Data in SQL databases is typically normalised, so queries for a single object or entity require you to join data from multiple tables. As your tables grow in size, the joins can become expensive. However, data in NoSQL databases is typically stored in a way that is optimised for queries.

Core Rule of Thumb (MongoDB)

"Data that is accessed together should be stored together."

Since related values are embedded directly in a single document block instead of spread across tables, queries typically do not require joins, making them exceptionally fast.

Key Trade-off & Limitation

While inserts and reads are blistering fast, deleting or updating duplicated/denormalized data fields spread across multiple records can be difficult and costly.

E. CachingIntegrated Caching Mechanism

Many NoSQL systems integrate native caching mechanisms directly. By keeping frequently accessed records or configuration sets in physical RAM based on keys, they reduce disk I/O and offer sub-millisecond responses.

F. Cloud ApplicationsCloud Native Applications Usecase

NoSQL use case is more for Cloud applications, where applications run inside globally distributed, containerized, dynamically scalable cloud platforms.

When to use NoSQL?

Choosing NoSQL over an RDBMS depends entirely on the design goals of your system. You should choose NoSQL when you encounter the following scenarios:

Fast-Paced Agile Development

When applications iterate quickly and require rapid alterations down to the database without running complex schema migrations.

Structured & Semi-Structured Data Storage

Ideal for heterogeneous data shapes (JSON files, variable attributes) where enforcing a strict tabular shape causes high data-loss or engineering complexity.

Huge Volumes of Data

When data grows exponentially, exceeding the limits of single-server disk/memory capacities, requiring seamless distributed storage.

Scale-Out Architecture Requirements

When your primary scaling strategy is adding commodity servers (horizontal scale-out) rather than buying expensive single-host database servers.

Modern Application Paradigms

Highly suited for distributed architectures like micro-services, real-time message streams, event triggers, and high-frequency IoT logging.

NoSQL DB Misconceptions

Myth 1: Relationship Data

"Relationship data is best suited *only* for relational databases."

Reality:A common misconception is that NoSQL databases or non-relational databases don't store relationship data well. NoSQL databases can store relationship data — they just store it differently than relational databases do. In fact, when compared with relational databases, many find modelling relationship data in NoSQL databases to be easier than in relational databases, because related data doesn't have to be split between tables. NoSQL data models allow related data to be nested within a single data structure.

Myth 2: ACID Transactions

"NoSQL databases don't support ACID transactions."

Reality:Another common misconception is that NoSQL databases don't support ACID transactions. Some NoSQL databases like MongoDB do, in fact, support multi-document ACID transactions, providing rich consistency guarantees for transactional workflows.

Types of NoSQL Data Models

There are four distinct models under the NoSQL umbrella, each optimized for different storage paradigms, access patterns, and query complexity:

1. Key-Value Stores

Simplest Model

The simplest type of NoSQL database is a key-value store. Every data element in the database is stored as a key-value pair consisting of an attribute name (or "key") and a value. In a sense, a key-value store is like a relational database with only two columns: the key or attribute name (such as "state") and the value (such as "Alaska").

A key-value database associates a value (which can be anything from a number or simple string to a complex object) with a key, which is used to keep track of the object. In its simplest form, a key-value store is like a dictionary / array / map object as it exists in most programming paradigms, but which is stored in a persistent way and managed by a Database Management System (DBMS).

🚀 Key-value databases use compact, efficient index structures to be able to quickly and reliably locate a value by its key, making them ideal for systems that need to be able to find and retrieve data in constant time.

Optimal Use Cases

Real-time random data access (e.g. user sessions in online applications like gaming/finance).
Caching mechanisms for frequently accessed data or configurations.
Applications designed purely on simple key-based queries.
Shopping carts, user preferences, and profile storage.

Examples

RedisAmazon DynamoDBOracle NoSQLMongoDB (supports KV)

2. Column-Oriented / Columnar / C-Store / Wide-Column

Analytics Champion

The data is stored such that each row of a column will be next to other rows from that same column. While a relational database stores data in rows and reads data row by row, a column store is organised as a set of columns.

This columnar organization means that when you want to run analytics on a small number of columns, you can read those columns directly without consuming memory with the unwanted data.

Columns are often of the same type and benefit from more efficient compression algorithms, making reads even faster. Columnar databases can quickly aggregate the value of a given column (e.g. adding up total sales for a year in milliseconds).

Optimal Use Cases

High-scale data warehousing.
Complex analytical dashboards.
High volume aggregation calculations.
Time-series logs analytics.

Examples

Apache CassandraAmazon RedshiftSnowflake

3. Document Based Stores

General Purpose

This DB store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or nested objects.

ACID Suitability: Document-based databases like MongoDB natively support ACID transaction properties, making them reliable, secure, and fully suitable for mission-critical Transactions.

Optimal Use Cases

E-commerce systems (variable product catalogs).
Financial and trading platforms.
High agility mobile app development across industries.
Content management systems (CMS).

Examples

MongoDBCouchDB

4. Graph Based Stores

Relationship First

A graph database focuses on the relationship between data elements. Each element is stored as a node (such as a person in a social media graph). The connections between elements are called links or relationships. In a graph database, connections are first-class elements of the database, stored directly. In relational databases, links are implied, using data to express the relationships.

A graph database is optimised to capture and search the connections between data elements, overcoming the massive JOIN operations overhead associated with joining multiple tables in SQL.

⚠️ Note: Very few real-world business systems can survive solely on graph queries. As a result graph databases are usually run alongside other more traditional databases.

Optimal Use Cases

Fraud detection networks and threat tracking.
Social networking maps (friend recommendations).
Knowledge graphs and semantic search web engines.
Identity access management hierarchies.

Examples

Neo4jAmazon NeptuneOrientDB

NoSQL Databases Disadvantages

Despite their scalability and flexible schema benefits, NoSQL databases carry significant technological trade-offs and disadvantages that engineers must account for:

Data Redundancy and Increased Storage Footprint

Since data models in NoSQL databases are typically optimised for queries and not for reducing data duplication, NoSQL databases can be larger than SQL databases. Storage is currently so cheap that most consider this a minor drawback, and some NoSQL databases also support compression to reduce the storage footprint.

Highly Costly Update & Delete Operations

Because NoSQL databases duplicate and denormalize data to enable join-free, high-speed read queries, updating or deleting a specific data attribute requires updating it across multiple files/records. This results in heavy processing costs and synchronization lags.

Model Limitations: A Single Model Doesn't Fulfill All Application Needs

Depending on the NoSQL database type you select, you may not be able to achieve all of your use cases in a single database. For example, graph databases are excellent for analysing relationships in your data but may not provide what you need for everyday retrieval of the data such as range queries. When selecting a NoSQL database, consider what your use cases will be and if a general purpose database like MongoDB would be a better option.

Lack of General ACID Properties Support

While some advanced databases support ACID transactions, NoSQL databases do not support ACID properties in general across all classes and models. Systems often adopt the BASE model (Basically Available, Soft State, Eventual Consistency), sacrificing absolute state consistency for uptime and speed.

Absence of Strict Consistency Constraints

NoSQL systems do not support data entry with consistency constraints. The lack of engine-enforced constraints (such as strict foreign keys, domain bounds, and check limits) shifts the responsibility of protecting data integrity fully onto the application code layer.

SQL vs NoSQL Comparison

Feature	SQL Databases	NoSQL Databases
Data Storage Model	Tables with fixed rows and columns	Document: JSON documents, Key-value: key-value pairs, Wide-column: tables with rows and dynamic columns, Graph: nodes and edges
Development History	Developed in the 1970s with a focus on reducing data duplication	Developed in the late 2000s with a focus on scaling and allowing for rapid application change driven by agile and DevOps practices.
Examples	Oracle, MySQL, Microsoft SQL Server, and PostgreSQL	Document: MongoDB and CouchDB, Key-value: Redis and DynamoDB, Wide-column: Cassandra and HBase, Graph: Neo4j and Amazon Neptune
Primary Purpose	General Purpose	Document: general purpose, Key-value: large amounts of data with simple lookup queries, Wide-column: large amounts of data with predictable query patterns, Graph: analyzing and traversing relationships between connected data
Schemas	Fixed	Flexible
Scaling	Vertical (Scale-up)	Horizontal (scale-out across commodity servers)
ACID Properties	Supported	Not Supported, except in DB like MongoDB etc.
JOINS	Typically Required	Typically not required
Data to object mapping	Required object-relational mapping	Many do not require ORMs. MongoDB documents map directly to data structures in most popular programming languages.