We’re going to take some time and examine exactly how indexes work and when we should use them. We’re going to start with the clustered index. It’s the most useful index in SQL. Every table we make should usually have a clustered index. We’ll talk about how to put them in place and how to maintain them.

Imagine that every house on your street had a random number that was chosen by the builder. This number has nothing to do with the numbers around it. Would it be hard to find a house if you did not know the neighborhood? It might be almost impossible!

Fortunately, almost every municipality on the planet has an organized system of numbering lots and houses. Thanks to this, even a GPS unit that doesn’t really know where your house is can make a rough estimate that will get you to the right block. SQL server faces similar challenges every time we tell it to go get a record.

When we don’t use any indexes, we basically create a neighborhood with random address numbers. SQL has no choice but to drive down the street, looking at each house number until it finds ours. It might find it in the third house it passes or it might find it at the 5043rd row! If you asked SQL Server for something like every single record from the month of June, it has to keep coming back around the block and looking for the next matching address.

For this reason, we want to have, at bare minimum, a clustered index on each table. Now we’re going to learn how to create them. It’s a straight-forward process that pays off in a big way.

Creating a Clustered Index

There are two ways to add clustered indexes in SQL server. We’ll look at a way to add one after you’ve created a table and also one that builds it at the moment of creation.

Adding a Clustered Index To An Existing Table

Syntax:

CREATE CLUSTERED INDEX [index name] ON [table name] ( [existing column] )

Example:

CREATE CLUSTERED INDEX (ClassID_Index) ON Class (ClassID)

This tells SQL server that the ClassID is now the “address” for the table Class. When we run this statement, our random rows of data are physically reorganized into a nice clean “neighborhood” with addresses and street signs for SQL server to use to find its way around.

Creating a Better Primary Key

Since primary keys and clustered indexes go hand in hand, we’re now going to look at how to create a better primary key. This will create a table that is ready to populate its data in an organized way that makes it easier for SQL Server to find it. It’s all part of a single CREATE SQL statement. We’re just going to add a few more things to it this time around.

Here’s an example. We’ll explain the new bold sections afterwards:

CREATE TABLE Class (
ClassID int IDENTITY(1,1) NOT NULL,
ClassName varchar(50),
ClassSize int,

CONSTRAINT PK_Class PRIMARY KEY CLUSTERED (ClassID ASC)

IDENTITY(1,1) NOT NULL 

This statement tells SQL that we want ClassID to automatically fill in starting at one and going up by one each time. The NOT NULL clause says we have to provide a value for this field when adding a new record. Basically, we’re telling SQL server, ClassID is the address for this row and the numbers should go up in order, just like a neighborhood’s addresses do. We’re also saying you cannot have a house with no address sign.

CONSTRAINT PK_Class PRIMARY KEY CLUSTERED (ClassID ASC)

Now, we’re saying that their address number physically orders all the “houses” in our “neighborhood”. Row one is next to row two and so on. We can basically follow this format any time we make a table.

What Happens When Rows are Inserted?

Clustered indexes speed up SQL’s ability to find records, but they actually slow down the process of adding and deleting records. When we insert a new row, SQL has to take a look at its neighborhood and get the new address number. It also decides, at certain predetermined point, that the new house will go on a new block. Most of the time, this work takes nanoseconds.

What Happens When Rows Are Deleted?

Deleting rows disrupts the neighborhood. If you erase row three, you now have a vacant lot between row two and four. SQL leaves this row blank for the time being. Eventually, though, if too many rows have been deleted, SQL will spend a lot of time driving past vacant lots looking for valid addresses. SQL also decides at predetermined points when to completely reorganize the neighborhood to fill up the vacant lots. It’s the same thing as defragging a hard drive. It makes the most of the space available by getting rid of empty spaces.

All of this happens behind the scenes in SQL server. We won’t get into the complex details of how, when and why SQL decides to do this work. It won’t effect your queries in any large way. The takeaway here is that clustered indexes usually lead to better performance and help SQL stay organized. Start adding them to any tables you make going forward!

Categories : SQL
Tags : Basic SQL, Clustered Index


author's photo

Author : SQL Tutor

Tainyan staff who teach SQL.

COMMENTS
blog comments powered by Disqus
Contributors
Categories