database_index

Database Index (SQL)

TL;DR

A database index is a data structure that improves the speed of data retrieval operations on a table at the cost of additional space and slower writes. It works similarly to the index of a book — letting the database quickly locate the rows you're querying for.

What is an Index?

An index in SQL databases is a performance optimization feature used to speed up the retrieval of rows. It’s created on one or more columns of a table and works like a lookup table that the database can use instead of scanning the entire dataset.

For example, you can create an index on the email column of a users table to speed up queries that search by email:

CREATE INDEX idx_users_email ON users(email);

Without an index, the database must scan the entire users table (a full table scan) to find the requested data.

How an Index Works

Under the hood, most relational databases use B-trees or variants like B+ trees for indexing. These structures allow logarithmic time complexity (O(log n)) for reads, compared to linear time complexity (O(n)) for full scans.

Imagine a Python example using a dictionary as a basic analogy:

# Unindexed: O(n) search
users = [{'id': 1, 'email': 'a@example.com'}, {'id': 2, 'email': 'b@example.com'}]
def find_user_by_email(email):
    for user in users:
        if user['email'] == email:
            return user

# Indexed: O(1) lookup using hash (simplified)
user_index = {'a@example.com': {'id': 1, 'email': 'a@example.com'},
              'b@example.com': {'id': 2, 'email': 'b@example.com'}}
def find_user_by_email_indexed(email):
    return user_index.get(email)

In reality, SQL databases use tree-based structures (not hash maps) to support range queries and ordering.

Pros and Cons

✅ Pros

Faster queries: Especially for large datasets.
Improved performance: On SELECTs, JOINs, WHERE, ORDER BY, and GROUP BY.
Better scalability: Indexes make complex queries more performant.

❌ Cons

Slower inserts/updates/deletes: Because the index must also be updated.
Additional storage: Indexes consume disk space.
Maintenance overhead: Too many indexes can degrade performance.

When to Use Indexes

You often query by a column (e.g., email, username and most of the times timestamp/datetime).
You frequently join on a column.
You sort (ORDER BY) or filter (WHERE) by a column.

Avoid indexing:

Columns with low cardinality (e.g., boolean fields).
Frequently updated columns, unless necessary.
Every column — more indexes ≠ better.