Lecture 17

Clustering in DBMS

As application scale expands into millions of concurrent transactions, a single physical database server becomes inadequate. Database Clustering combines multiple database instances to coordinate tasks, ensure data redundancy, distribute traffic, and guarantee near-zero downtime.

Understanding Database Clustering

At its core, Database Clustering (which is the foundation of making **Replica-sets**) is the process of combining more than one servers or instances connecting a single database.

Why do we need a Data Cluster?

Sometimes, a single database server may not be adequate to manage the enormous amount of storage data or handle the sheer number of incoming client requests. That is precisely when a Data Cluster is needed.

By clustering databases, we **replicate the same dataset across different physical servers**, allowing queries and operations to scale cleanly. Terms such as database clustering, SQL server clustering, and SQL clustering are closely associated with SQL, as SQL is the standardized language utilized to manage and manipulate the database information across these nodes.

Visual Clustered Architecture

Multi-Node Database Clustering System

Concurrent Client Requests

LOAD BALANCER

Distributes read/write queries and monitors node health

Primary NodeActive Server AREAD / WRITE

Secondary NodeServer BREPLICA (READ)

Secondary NodeServer CREPLICA (READ)

Active Synchronization Loop

If Node A (Primary) collapses, an automated election immediately promotes Node B or C to Primary, guaranteeing zero service interruption.

Key Advantages of Clustering

Clustering introduces three fundamental qualities to enterprise-grade database management systems:

1. Data Redundancy

Synchronized Data Redundancy

Clustering of databases helps with data redundancy, as we store the same data at multiple servers. Do not confuse this data redundancy as repetition of the same data that might lead to some anomalies.

Redundancy with Consistency: The redundancy that clustering offers is required and is quite certain due to the strict synchronization. In case any of the servers had to face a failure due to any possible reason, the complete database is immediately available at other active servers to access.

2. Load Balancing

Dynamic Load Balancing & Scale-Out

Scalability does not come by default with the database; it has to be brought by clustering regularly. It also depends heavily on the setup. Basically, what load balancing does is allocating the workload among the different servers that are part of the cluster.

This indicates that more users can be supported and if for some reasons if a huge spike in the traffic appears, there is a higher assurance that the system will be able to support the new traffic. One machine is not going to get all of the hits.

direct link to high availability: Clustering provides seamless, on-demand scaling. Without load balancing, a particular machine could get overworked and traffic would slow down, leading to the decrement of the traffic to zero.

3. High Availability

High Availability (System Uptime)

When you can access a database, it implies that it is available. High availability refers the amount of time a database is considered available.

Analytics & Transactions Readiness: The amount of availability you need greatly depends on the number of transactions you are running on your database and how often you are running any kind of analytics on your data. With database clustering, we can reach extremely high levels of availability due to load balancing and having extra machines. In case a server got shut down the database will, however, be available.

How does Clustering Work?

Request Splitting and Node Isolation

In cluster architecture, all requests are split with many computers so that an individual user request is executed and produced by a number of computer systems.

🛡️ The clustering is serviceable definitely by the ability of load balancing and high-availability. If one node collapses, the request is instantly handled by another node. Consequently, there are few or no possibilities of absolute system failures.

Master-Slave replication & Data Propagation

To keep multiple servers synchronized, databases utilize a Master-Slave (Primary-Replica) replication topology. Understanding how data updates propagate between these nodes is critical for designing scalable systems.

Writes Executed on Master Node First

All insert, update, and delete statements (Write traffic) are sent directly to the Master Node. The Master acts as the sole coordinator that writes changes to its local tables and appends them to a transaction log (e.g., Binary Log or WAL).

Asynchronous Propagation to Slave Nodes

To maintain low write response times for the client, replication to Slave Nodes is typically asynchronous. The Master confirms the write to the client immediately and sends log updates to the slaves in the background.

⚠️ The Replication Lag & Temporary Inconsistency

Because the logs travel across the network asynchronously, there is a delay known as Replication Lag. For a few seconds or minutes (especially during high network traffic or server load), the Slave nodes do not have the latest records.

This results in temporary read-write inconsistency. If a client uploads a picture (updating the Master) and immediately redirects to their feed (reading from a Slave), the feed might display outdated information because the changes have not propagated yet!

Asynchronous Master-Slave Replication & Consistency Lag Window