>_
EngineeringNotes
Back to DBMS Topics

Normalisation

Normalisation is a critical step towards DB optimisation.

  • It is used to minimise redundancy from relations. It also eliminates undesirable characteristics like Insertion, Update, and Deletion Anomalies.
  • Normalisation divides composite attributes into individual attributes OR larger tables into smaller ones and links them using relationships.
  • The normal form is used to ensure a structured, non-redundant database architecture.

1Functional Dependency (FD)

  • It's a relationship between the primary key attribute (usually) of the relation to that of the other attribute of the relation.
  • X -> Y, the left side of FD is known as a Determinant, the right side of the production is known as a Dependent.

Types of FD

Trivial FD
  • A -> B has trivial functional dependency if B is a subset of A.
  • A -> A, B -> B are also Trivial FD.

Example

{user_id, user_name} -> user_name

Since user_name is a subset of {user_id, user_name}, this is trivial.

Non-trivial FD
  • A -> B has a non-trivial functional dependency if B is not a subset of A. [A ∩ B = ∅].

Example

user_id -> user_address

Here, user_address is not a subset of user_id. Finding an address based on an ID is exactly what non-trivial FD is about.

Rules of FD (Armstrong's axioms)

1. Reflexive

If 'A' is a set of attributes and 'B' is a subset of 'A'. Then, A -> B holds.

2. Augmentation

If B can be determined from A, then adding an attribute X to both sides won't change the validity (AX -> BX).

3. Transitivity

If A -> B and B -> C, then we can conclude that A -> C.

2Why Normalisation? & Anomalies

To avoid redundancy in the DB, not to store redundant data.

What happen if we have redundant data? Insertion, deletion, and updation anomalies arise. Due to these anomalies, DB size increases and DB performance becomes very slow. To rectify these anomalies and the effect of these of DB, we use Database optimisation technique called NORMALISATION.

The Problem: Anomalies

Anomalies are abnormalities introduced by data redundancy. There are three primary types:

1. Insertion

Unable to add data without the presence of unrelated mandatory attributes.

2. Deletion

Unintended loss of mandatory data when deleting an unrelated piece of information.

3. Updation

Updating one value requires multiple row changes, leading to data inconsistency.

3Types of Normal Forms

1NF (First Normal Form)

  • Every relation cell must have an atomic value.
  • Relation must not have multi-valued attributes.

Example A user table with a multi-valued Phone column.

Unnormalized Table (Not 1NF):

Emp_IDNamePhone
1John123-456, 987-654
2Alice555-123

Converted to 1NF:

Rule: Create a new tuple (row) for each multi-valued item.

Emp_IDNamePhone
1John123-456
1John987-654
2Alice555-123

2NF (Second Normal Form)

  • Relation must be in 1NF.
  • There should not be any partial dependency.
    • All non-prime attributes must be fully dependent on PK.
    • Non prime attribute can not depend on the part of the PK.

Example Consider a relation R(A, B, C, D)

  • Let the Primary Key (PK) be {A, B}.
  • A, B are prime attributes.
  • C, D are non-prime attributes.

Functional Dependencies (FD):

  • B -> C (Partial Dependency: C depends only on part of PK)
  • AB -> C (Valid)
  • AB -> D (Valid)

2NF Conversion (Decomposition):

To remove the partial dependency B -> C, split the table into two:

  • R1 ( A, B, D ) // AB -> D
  • R2 ( B, C ) // B -> C

3NF (Third Normal Form)

  • Relation must be in 2NF.
  • No transitivity dependency exists.
    • Non-prime attribute should not find a non-prime attribute.

Example A student/department relationship where Dept_Name depends on Dept_ID, which depends on Student_ID.

FD: Student_ID -> Dept_ID & Dept_ID -> Dept_Name

Not in 3NF (Transitive Dependency exists):

Student_IDNameDept_IDDept_Name
101JohnD1CS
102AliceD2IT

Converted to 3NF:

TABLE 1: STUDENT

Student_IDNameDept_ID
101JohnD1
102AliceD2

TABLE 2: DEPARTMENT

Dept_IDDept_Name
D1CS
D2IT

BCNF (Boyce-Codd Normal Form)

  • Relation must be in 3NF.
  • FD: A -> B, A must be a super key.
    • We must not derive prime attribute from any prime or non-prime attribute.

Example Student Enrollment Table

Stud_IDSubjectProfessor
101JavaPJ
101CPPPC
102JavaPJ2
103C#PC#
104JavaPJ

Rules in this Table:

  • One student can enroll in multiple subjects.
  • For each subject, a professor is assigned to a student.
  • Multiple professors can teach a single subject (e.g., Java is taught by PJ & PJ2).
  • One professor can teach only one subject.

Identifying Keys & Dependencies:

Primary Key: {Stud_ID, Subject}

{Stud_ID, Subject} -> Professor (Valid PK dependency)
Professor -> Subject (Violation of BCNF!)

Why it violates BCNF: The attribute Professor determines Subject (a prime attribute). However, Professor is not a Super Key on its own. To fix this, decompose the table into R1(Stud_ID, Professor) and R2(Professor, Subject).

4Advantages of Normalisation

  • Minimises data redundancy
  • Greater overall database organisation
  • Data consistency is maintained in DB