Normalisation is a critical step towards DB optimisation.
X -> Y, the left side of FD is known as a Determinant, the right side of the production is known as a Dependent.Example
{user_id, user_name} -> user_name
Since user_name is a subset of {user_id, user_name}, this is trivial.
Example
user_id -> user_address
Here, user_address is not a subset of user_id. Finding an address based on an ID is exactly what non-trivial FD is about.
If 'A' is a set of attributes and 'B' is a subset of 'A'. Then, A -> B holds.
If B can be determined from A, then adding an attribute X to both sides won't change the validity (AX -> BX).
If A -> B and B -> C, then we can conclude that A -> C.
To avoid redundancy in the DB, not to store redundant data.
What happen if we have redundant data? Insertion, deletion, and updation anomalies arise. Due to these anomalies, DB size increases and DB performance becomes very slow. To rectify these anomalies and the effect of these of DB, we use Database optimisation technique called NORMALISATION.
Anomalies are abnormalities introduced by data redundancy. There are three primary types:
Unable to add data without the presence of unrelated mandatory attributes.
Unintended loss of mandatory data when deleting an unrelated piece of information.
Updating one value requires multiple row changes, leading to data inconsistency.
Example A user table with a multi-valued Phone column.
Unnormalized Table (Not 1NF):
| Emp_ID | Name | Phone |
|---|---|---|
| 1 | John | 123-456, 987-654 |
| 2 | Alice | 555-123 |
Converted to 1NF:
Rule: Create a new tuple (row) for each multi-valued item.
| Emp_ID | Name | Phone |
|---|---|---|
| 1 | John | 123-456 |
| 1 | John | 987-654 |
| 2 | Alice | 555-123 |
Example Consider a relation R(A, B, C, D)
{A, B}.A, B are prime attributes.C, D are non-prime attributes.Functional Dependencies (FD):
B -> C (Partial Dependency: C depends only on part of PK)AB -> C (Valid)AB -> D (Valid)2NF Conversion (Decomposition):
To remove the partial dependency B -> C, split the table into two:
Example A student/department relationship where Dept_Name depends on Dept_ID, which depends on Student_ID.
FD: Student_ID -> Dept_ID & Dept_ID -> Dept_Name
Not in 3NF (Transitive Dependency exists):
| Student_ID | Name | Dept_ID | Dept_Name |
|---|---|---|---|
| 101 | John | D1 | CS |
| 102 | Alice | D2 | IT |
Converted to 3NF:
TABLE 1: STUDENT
| Student_ID | Name | Dept_ID |
|---|---|---|
| 101 | John | D1 |
| 102 | Alice | D2 |
TABLE 2: DEPARTMENT
| Dept_ID | Dept_Name |
|---|---|
| D1 | CS |
| D2 | IT |
Example Student Enrollment Table
| Stud_ID | Subject | Professor |
|---|---|---|
| 101 | Java | PJ |
| 101 | CPP | PC |
| 102 | Java | PJ2 |
| 103 | C# | PC# |
| 104 | Java | PJ |
Rules in this Table:
Identifying Keys & Dependencies:
Primary Key: {Stud_ID, Subject}
{Stud_ID, Subject} -> Professor (Valid PK dependency)Professor -> Subject (Violation of BCNF!)Why it violates BCNF: The attribute Professor determines Subject (a prime attribute). However, Professor is not a Super Key on its own. To fix this, decompose the table into R1(Stud_ID, Professor) and R2(Professor, Subject).