>_
EngineeringNotes
Back to DBMS Topics

Normalisation

Normalisation is a step towards DB optimisation.

  • It is used to minimise the redundancy from a relations. It is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
  • Normalisation divides the composite attributes into individual attributes OR larger table into smaller and links them using relationships.
  • The normal form is used to reduce redundancy from the database table.

1Functional Dependency (FD)

  • It's a relationship between the primary key attribute (usually) of the relation to that of the other attribute of the relation.
  • X -> Y, the left side of FD is known as a Determinant, the right side of the production is known as a Dependent.

Types of FD

Trivial FD
  • A -> B has trivial functional dependency if B is a subset of A.
  • A -> A, B -> B are also Trivial FD.

Example: {user_id, user_name} -> user_name

Since user_name is a subset of {user_id, user_name}, this dependency is trivial.

Non-trivial FD
  • A -> B has a non-trivial functional dependency if B is not a subset of A. [A intersection B is NULL].

Example: user_id -> user_address

Here, user_address is not a subset of user_id. Finding an address based on an ID is exactly what non-trivial FD is about.

Rules of FD (Armstrong's axioms)

1. Reflexive
  • If 'A' is a set of attributes and 'B' is a subset of 'A'. Then, A -> B holds.
  • If A ⊇ B then A -> B.
2. Augmentation
  • If B can be determined from A, then adding an attribute to this functional dependency won't change anything.
  • If A -> B holds, then AX -> BX holds too. 'X' being a set of attributes.
3. Transitivity
  • If A determines B and B determines C, we can say that A determines C.
  • If A -> B and B -> C then A -> C.

2Why Normalisation? & Anomalies

To avoid redundancy in the DB, not to store redundant data.

What happen if we have redundant data? Insertion, deletion, and updation anomalies arise. Due to these anomalies, DB size increases and DB performance becomes very slow. To rectify these anomalies and the effect of these of DB, we use Database optimisation technique called NORMALISATION.

Anomalies

Anomalies means abnormalities, there are three types of anomalies introduced by data redundancy.

Insertion anomaly

When certain data (attribute) cannot be inserted into the DB without the presence of other data.

Deletion anomaly

The delete anomaly refers to the situation where the deletion of data results in the unintended loss of some other important data.

Updation anomaly (or modification anomaly)
  • The update anomaly is when an update of a single data value requires multiple rows of data to be updated.
  • Due to updation to many places, may be Data inconsistency arises, if one forgets to update the data at all the intended places.

3Types of Normal Forms

1NF (First Normal Form)

  • Every relation cell must have an atomic value.
  • Relation must not have multi-valued attributes.

Example: A user table with a multi-valued Phone column.

Unnormalized Table (Not 1NF):

Emp_IDNamePhone
1John123-456, 987-654
2Alice555-123

Converted to 1NF:

Rule: Create a new tuple (row) for each multi-valued item.

Emp_IDNamePhone
1John123-456
1John987-654
2Alice555-123

2NF (Second Normal Form)

  • Relation must be in 1NF.
  • There should not be any partial dependency.
    • All non-prime attributes must be fully dependent on PK.
    • Non prime attribute can not depend on the part of the PK.

Example: Consider a relation R(A, B, C, D)

  • Let the Primary Key (PK) be {A, B}.
  • A, B are prime attributes.
  • C, D are non-prime attributes.

Functional Dependencies (FD):

  • B -> C (This is a Partial Dependency because C depends only on part of the PK, which is B)
  • AB -> C (Valid)
  • AB -> D (Valid)

2NF Conversion (Decomposition):

To remove the partial dependency B -> C, split the table into two:

  • R1 ( A, B, D ) // AB -> D
  • R2 ( B, C ) // B -> C

3NF (Third Normal Form)

  • Relation must be in 2NF.
  • No transitivity dependency exists.
    • Non-prime attribute should not find a non-prime attribute.

Example: A student/department relationship where Dept_Name depends on Dept_ID, which depends on Student_ID.

FD: Student_ID -> Dept_ID & Dept_ID -> Dept_Name

Not in 3NF (Transitive Dependency exists):

Student_IDNameDept_IDDept_Name
101JohnD1CS
102AliceD2IT

Converted to 3NF:

Table 1: Student

Student_IDNameDept_ID
101JohnD1
102AliceD2

Table 2: Department

Dept_IDDept_Name
D1CS
D2IT

BCNF (Boyce-Codd Normal Form)

  • Relation must be in 3NF.
  • FD: A -> B, A must be a super key.
    • We must not derive prime attribute from any prime or non-prime attribute.

Example: Student Enrollment Table

Stud_IDSubjectProfessor
101JavaPJ
101CPPPC
102JavaPJ2
103C#PC#
104JavaPJ

Rules in this Table:

  • One student can enroll in multiple subjects.
  • For each subject, a professor is assigned to a student.
  • Multiple professors can teach a single subject (e.g., Java is taught by PJ & PJ2).
  • One professor can teach only one subject.

Identifying Keys & Dependencies:

Primary Key: {Stud_ID, Subject}

  1. {Stud_ID, Subject} -> Professor (Valid PK dependency)
  2. Professor -> Subject (Violation of BCNF!)

Why it violates BCNF: The attribute Professor determines Subject (a prime attribute). However, Professor is not a Super Key on its own. To fix this, decompose the table into R1(Stud_ID, Professor) and R2(Professor, Subject).

4Advantages of Normalisation

  • Normalisation helps to minimise data redundancy.
  • Greater overall database organisation.
  • Data consistency is maintained in DB.