>_
EngineeringNotes
Back to DBMS Topics

Indexing in DBMS

1What is Indexing?

Indexing is used to optimise the performance of a database by minimising the number of disk accesses required when a query is processed.

The index is a type of data structure. It is used to locate and access the data in a database table quickly, speeding up operations with read operations like SELECT queries, WHERE clause etc.

Search Key

Contains copy of primary key or candidate key of the table or something else.

Data Reference

Pointer holding the address of disk block where the value of the corresponding key is stored.

Indexing is optional, but increases access speed. It is not the primary mean to access the tuple, it is the secondary mean.

Index file is always sorted.

2Indexing Methods

Primary Index (Clustering Index)

Note: The term primary index is sometimes used to mean an index on a primary key. However, such usage is nonstandard and should be avoided.

A file may have several indices, on different search keys. If the data file containing the records is sequentially ordered, a Primary Index is an index whose search key also defines the sequential order of the file.

4. Dense And Sparse Indices
1. Dense IndexSpace Disadvantage
  1. The dense index contains an index record for every search key value in the data file.
  2. The index record contains the search-key value and a pointer to the first data record with that search-key value.
  3. The rest of the records with the same search-key value would be stored sequentially after the first record.
  4. It needs more space to store index record itself. The index records have the search key and a pointer to the actual record on the disk.
Index File (Sorted)
KeyPointer
101→ ptr_1
201→ ptr_2
301→ ptr_3
401→ ptr_4
501→ ptr_5
... every record indexed ...
Data File (Sequential)
#1 Record with ID 101
#2 Record with ID 201
#3 Record with ID 301
#4 Record with ID 401
#5 Record with ID 501
Why & Where to use it? (Use Case)

Dense Indexing is used when Search Performance is the highest priority. Since every key is present, the database can find exact locations without any further sequential scanning in the data blocks.

  • Used when memory is abundant but IO latency must be minimal.
  • Ideal for Primary Key Lookups on highly clustered data.
2. Sparse IndexSpace Efficient
  1. An index record appears for only some of the search-key values.
  2. Sparse Index helps you to resolve the issues of dense Indexing in DBMS. In this method of indexing technique, a range of index columns stores the same data block address, and when data needs to be retrieved, the block address will be fetched.
Index File (Sparse)
Roll NoBP (Pointer)
1ptr_B1
11ptr_B2
21ptr_B3
9991ptr_B1000
Data Blocks (Disk)
Block B1

Start Key: 1

Block B2

Start Key: 11

Block B3

Start Key: 21

Efficiency Note: As seen in the diagram (based on the roll number mapping), we only store the entry for the start of each block. If we look for Roll No 15, we jump to B2 and scan locally.

5. Primary Indexing can be based on Data file is sorted w.r.t Primary Key attribute or non-key attributes.

6. Based on Key Attribute
  • Data file is sorted w.r.t primary key attribute.
  • PK will be used as search-key in Index.
  • Sparse Index will be formed i.e., no. of entries in the index file = no. of blocks in datafile.
7. Based on Non-Key Attribute
  • Data file is sorted w.r.t non-key attribute.
  • No. of entries in the index = unique non-key attribute value in the data file.
  • This is dense index as, all the unique values have an entry in the index file.
  • Used to group records with same attribute (e.g. Dept).

Multi-level Index

  • Index with two or more levels.
  • If the single level index become enough large that the binary search it self would take much time, we can break down indexing into multiple levels.

Secondary Index (Non-Clustering Index)

  • Datafile is unsorted. Hence, Primary Indexing is not possible.
  • Can be done on key or non-key attribute.
  • Called secondary indexing because normally one indexing is already applied.
  • No. of entries in index file = no. of records in the data file.
  • It's an example of Dense index.

3Coding Simulation: The Power of Indexing

Access count: 0 disk blocks

Implementation in JavaScript (Map)

javascript
javascript
// Without Indexing (Linear Search)
function findUser(users, id) {
  for (let user of users) {
    if (user.id === id) return user; // O(N) complexity
  }
}

// With Indexing (Hash Map)
const userIndex = new Map(users.map(u => [u.id, u]));

function findUserFast(id) {
  return userIndex.get(id); // O(1) complexity
}

4Advantages & Limitations

Advantages of Indexing

  • Faster access and retrieval of data.
  • IO is less.

Limitations of Indexing

  • Additional space to store index table.
  • Indexing Decrease performance in INSERT, DELETE, and UPDATE query.