[Explained] How to Create a Database Index in SQL


Want to pace up database queries? Learn how to create a database index utilizing SQL and optimize question efficiency—and pace up knowledge retrieval.

When you’re retrieving knowledge from a database desk, you’ll have to filter primarily based on particular columns extra typically.

Suppose you write an SQL question to retrieve the information primarily based on particular situations. By default, operating the question runs a full-table scan till all of the information that fulfill the situation have been discovered after which returns the outcomes.

This might be extraordinarily inefficient when you might have to question a massive database desk with a number of million rows. You can pace up such queries by creating a database index.

What is a Database Index?


When you need to discover a particular time period in a ebook, will you do a full-book scan—one web page after the opposite—in search of the actual time period? Well, you don’t.

You’ll as an alternative search for the index to discover out which pages reference the time period and leap straight to these pages. An index in a database works a lot just like the indexes in a ebook.

A database index is a set of pointers or references to the precise knowledge however sorted in a method that makes knowledge retrieval quicker. Internally, a database index might be applied utilizing knowledge buildings comparable to B+ timber and hash tables. Therefore, a database index improves the pace and effectivity of information retrieval operations.

Creating a Database Index in SQL

Now that we all know what a database index is and the way it can pace up knowledge retrieval, let’s learn the way to create a database index in SQL.

When you carry out filtering operations—by specifying the retrieval situation utilizing a WHERE clause—you may want to question a explicit column extra ceaselessly than others.

CREATE INDEX index_name ON desk (column)


  • index_name is the title of the index to be created
  • desk refers to the desk in the relational database 
  • column refers to the title of the column in the database desk on which we want to create the index.

You may also create indexes on a number of columns—a multi-column index—relying on the necessities. Here’s the syntax to accomplish that:

CREATE INDEX index_name ON desk (column_1, column_2,...,column_k)

Now let’s transfer on to a sensible instance.

Understanding the Performance Gains of Database Index

To perceive the benefit of making an index, we want to create a database desk with a massive variety of information. The code examples are for SQLite. But it’s also possible to use different RDBMS of your alternative, comparable to PostgreSQL and MySQL.

Populating a Database Table With Records

You may also use Python’s built-in random module to create and insert information into the database. However, we’ll use Faker to populate the database desk with a million rows.

The following Python script:

  • Creates and connects to the customer_db database.
  • Create a prospects desk with the fields: first_name, last_name, metropolis, and num_orders.
  • Generates artificial knowledge and inserts knowledge—a million information—into the prospects desk.

You may also discover the code on GitHub.

# foremost.py
# imports
import sqlite3
from faker import Faker
import random

# join to the db
db_conn = sqlite3.join('customer_db.db')
db_cursor = db_conn.cursor()

# create desk
db_cursor.execute('''CREATE TABLE prospects (
                  id INTEGER PRIMARY KEY,
                  first_name TEXT,
                  last_name TEXT,
                  metropolis TEXT,
                  num_orders INTEGER)''')

# create a Faker object
faux = Faker()

# create and insert 1 million information
num_records = 1_000_000

for _ in vary(num_records):
    first_name = faux.first_name()
    last_name = faux.last_name()
    metropolis = faux.metropolis()
    num_orders = random.randint(0,100)
    db_cursor.execute('INSERT INTO prospects (first_name, last_name, metropolis, num_orders) VALUES (?,?,?,?)', (first_name, last_name, metropolis, num_orders))

# commit the transaction and shut the cursor and connection

Now we are able to begin querying.

Creating an Index on the City Column

Suppose you need to get the shopper data by filtering primarily based on the metropolis column. Your SELECT question will appear like so:

SELECT column(s) FROM prospects
WHERE situation;

So let’s create the city_idx on the metropolis column in the prospects desk:

CREATE INDEX city_idx ON prospects (metropolis);

Creating an index takes a non-negligible period of time and is a one-time operation. But the efficiency advantages while you want to massive variety of queries—by filtering on the metropolis column—shall be vital.

Deleting a Database Index

To delete an index, you need to use the DROP INDEX assertion like so:

DROP INDEX index_name;

Comparing Query Times With and Without Index

If you need to run queries inside a Python script, you need to use the default timer to get the execution instances for queries.

Alternatively, you may run the queries utilizing the sqlite3 command-line consumer. To work with customer_db.db utilizing the command-line consumer, run the next command on the terminal:

$ sqlite3 customer_db.db;

To get the approximate execution instances, you need to use the .timer performance constructed into sqlite3 like so:

sqlite3 > .timer on
        > <question right here>

Because we’ve created an index on the metropolis column, the queries which contain filtering primarily based on the metropolis column in the WHERE clause shall be a lot quicker.

First, run the queries. Then, create the index and re-run the queries. Note down the execution instances in each the instances. Here are some examples:

Query Time with out Index Time with Index
SELECT * FROM prospects
WHERE metropolis LIKE ‘New%’
0.100 s 0.001 s
SELECT * FROM prospects
WHERE metropolis=’New Wesley’;
0.148 s 0.001 s
SELECT * FROM prospects
WHERE metropolis IN (‘New Wesley’, ‘New Steven’, ‘New Carmenmouth’);
0.247 s 0.003 s

We see that the retrieval instances with index is a number of orders quicker than these with out index on town column.

Best Practices to Create and Use Database Indexes


You ought to at all times test if the efficiency good points are larger than the overhead of making a database index. Here are some finest practices to maintain in thoughts:

  • Choose the appropriate columns to create an index. Avoid creating too many indexes due to the substantial overhead. 
  • Every time an listed column is up to date, the corresponding index ought to be up to date, too. So creating a database index (although hurries up retrieval) considerably slows down insertions and replace operations. Therefore, you need to create indexes on columns which might be ceaselessly queried however not often up to date.

When Should You Not Create an Index?

By now you need to have an concept of when and the way to create an index. But let’s additionally state when database index may not be essential:

  • When the database desk is small and doesn’t comprise a massive variety of rows, full-table scan to retrieve knowledge shouldn’t be as costly.
  • Do not create indexes on columns which might be not often used for retrieval. When you create indexes on columns that aren’t ceaselessly queried, the price of creating and sustaining an index outweighs the efficiency good points.

Summing Up

Let’s evaluation what we’ve discovered:

  • When querying a database to retrieve knowledge, you might want to filter primarily based on sure columns extra ceaselessly. A database index on such ceaselessly queried columns can enhance efficiency.
  • To create an index on a single column, use the syntax: CREATE INDEX index_name ON desk (column). If you need to create s multi-column index, use: CREATE INDEX index_name ON desk (column_1, column_2,...,column_k)
  • Whenever an listed column is modified, the corresponding index ought to be up to date, too. Therefore, select the appropriate columns—ceaselessly queried and far much less ceaselessly up to date—to create an index.
  • If the database desk is comparatively smaller, the price of creating, sustaining, and updating an index shall be larger than the efficiency good points. 

In most trendy database administration programs, there’s a question optimizer that checks if an index on a particular column will make the question run quicker. Next, let’s be taught the perfect practices for database design.



Leave a Reply

Your email address will not be published. Required fields are marked *