A database index is a data structure that improves the speed of data retrieval operations on a table at the cost of additional space and slower writes. It works similarly to the index of a book — letting the database quickly locate the rows you're querying for.
An index in SQL databases is a performance optimization feature used to speed up the retrieval of rows. It’s created on one or more columns of a table and works like a lookup table that the database can use instead of scanning the entire dataset.
For example, you can create an index on the email
column of a users
table to speed up queries that search by email:
CREATE INDEX idx_users_email ON users(email);
Without an index, the database must scan the entire users
table (a full table scan) to find the requested data.
Under the hood, most relational databases use B-trees or variants like B+ trees for indexing. These structures allow logarithmic time complexity (O(log n)
) for reads, compared to linear time complexity (O(n)
) for full scans.
Imagine a Python example using a dictionary as a basic analogy:
# Unindexed: O(n) search
users = [{'id': 1, 'email': 'a@example.com'}, {'id': 2, 'email': 'b@example.com'}]
def find_user_by_email(email):
for user in users:
if user['email'] == email:
return user
# Indexed: O(1) lookup using hash (simplified)
user_index = {'a@example.com': {'id': 1, 'email': 'a@example.com'},
'b@example.com': {'id': 2, 'email': 'b@example.com'}}
def find_user_by_email_indexed(email):
return user_index.get(email)
In reality, SQL databases use tree-based structures (not hash maps) to support range queries and ordering.
email
, username
and most of the times timestamp/datetime).ORDER BY
) or filter (WHERE
) by a column.