Engineering Notes | Unified Knowledge Base

Back to DBMS Topics

Normalisation

Normalisation is a critical step towards DB optimisation.

It is used to minimise redundancy from relations. It also eliminates undesirable characteristics like Insertion, Update, and Deletion Anomalies.
Normalisation divides composite attributes into individual attributes OR larger tables into smaller ones and links them using relationships.
The normal form is used to ensure a structured, non-redundant database architecture.

1Functional Dependency (FD)

It's a relationship between the primary key attribute (usually) of the relation to that of the other attribute of the relation.
X -> Y, the left side of FD is known as a Determinant, the right side of the production is known as a Dependent.

Types of FD

Trivial FD

A -> B has trivial functional dependency if B is a subset of A.
A -> A, B -> B are also Trivial FD.

Example

{user_id, user_name} -> user_name

Since user_name is a subset of {user_id, user_name}, this is trivial.

Non-trivial FD

A -> B has a non-trivial functional dependency if B is not a subset of A. [A ∩ B = ∅].

Example

user_id -> user_address

Here, user_address is not a subset of user_id. Finding an address based on an ID is exactly what non-trivial FD is about.

Rules of FD (Armstrong's axioms)

1. Reflexive

If 'A' is a set of attributes and 'B' is a subset of 'A'. Then, A -> B holds.

2. Augmentation

If B can be determined from A, then adding an attribute X to both sides won't change the validity (AX -> BX).

3. Transitivity

If A -> B and B -> C, then we can conclude that A -> C.

2Why Normalisation? & Anomalies

To avoid redundancy in the DB, not to store redundant data.

What happen if we have redundant data? Insertion, deletion, and updation anomalies arise. Due to these anomalies, DB size increases and DB performance becomes very slow. To rectify these anomalies and the effect of these of DB, we use Database optimisation technique called NORMALISATION.

The Problem: Anomalies

Anomalies are abnormalities introduced by data redundancy. There are three primary types:

1. Insertion

Unable to add data without the presence of unrelated mandatory attributes.

2. Deletion

Unintended loss of mandatory data when deleting an unrelated piece of information.

3. Updation

Updating one value requires multiple row changes, leading to data inconsistency.

3Types of Normal Forms

1NF (First Normal Form)

Every relation cell must have an atomic value.
Relation must not have multi-valued attributes.

Example A user table with a multi-valued Phone column.

Unnormalized Table (Not 1NF):

Emp_ID	Name	Phone
1	John	123-456, 987-654
2	Alice	555-123

Converted to 1NF:

Rule: Create a new tuple (row) for each multi-valued item.

Emp_ID	Name	Phone
1	John	123-456
1	John	987-654
2	Alice	555-123

2NF (Second Normal Form)

Relation must be in 1NF.
There should not be any partial dependency.

All non-prime attributes must be fully dependent on PK.
Non prime attribute can not depend on the part of the PK.

Example Consider a relation R(A, B, C, D)

Let the Primary Key (PK) be {A, B}.
A, B are prime attributes.
C, D are non-prime attributes.

Functional Dependencies (FD):

B -> C (Partial Dependency: C depends only on part of PK)
AB -> C (Valid)
AB -> D (Valid)

2NF Conversion (Decomposition):

To remove the partial dependency B -> C, split the table into two:

R1 ( A, B, D ) // AB -> D
R2 ( B, C ) // B -> C

3NF (Third Normal Form)

Relation must be in 2NF.
No transitivity dependency exists.

Non-prime attribute should not find a non-prime attribute.

Example A student/department relationship where Dept_Name depends on Dept_ID, which depends on Student_ID.

FD: Student_ID -> Dept_ID & Dept_ID -> Dept_Name

Not in 3NF (Transitive Dependency exists):

Student_ID	Name	Dept_ID	Dept_Name
101	John	D1	CS
102	Alice	D2	IT

Converted to 3NF:

TABLE 1: STUDENT

Student_ID	Name	Dept_ID
101	John	D1
102	Alice	D2

TABLE 2: DEPARTMENT

Dept_ID	Dept_Name
D1	CS
D2	IT

BCNF (Boyce-Codd Normal Form)

Relation must be in 3NF.
FD: A -> B, A must be a super key.

We must not derive prime attribute from any prime or non-prime attribute.

Example Student Enrollment Table

Stud_ID	Subject	Professor
101	Java	PJ
101	CPP	PC
102	Java	PJ2
103	C#	PC#
104	Java	PJ

Rules in this Table:

One student can enroll in multiple subjects.
For each subject, a professor is assigned to a student.
Multiple professors can teach a single subject (e.g., Java is taught by PJ & PJ2).
One professor can teach only one subject.

Identifying Keys & Dependencies:

Primary Key: {Stud_ID, Subject}

{Stud_ID, Subject} -> Professor (Valid PK dependency)

Professor -> Subject (Violation of BCNF!)

Why it violates BCNF: The attribute Professor determines Subject (a prime attribute). However, Professor is not a Super Key on its own. To fix this, decompose the table into R1(Stud_ID, Professor) and R2(Professor, Subject).

4Advantages of Normalisation

Minimises data redundancy
Greater overall database organisation
Data consistency is maintained in DB