This lesson will focus on a basic understanding of how the SQL server engine works.
Today, we're going to take a break from writing queries and learning commands. As we get into advanced SQL topics, we're going to need to understand exactly what's happening under the hood of SQL server. In this lesson, we'll take our first look at Indexes and how they impact our database. This lesson will prime us for future lessons about indexing and other functionality.
DISCLAIMER: The actual processes are a little bit more complicated than what is presented here. This is meant to give new users a basic understanding without overwhelming them with details.
Like any computer program, SQL is ultimately storing data in a file on a hard drive or SAN space. In your local install, the data is physically stored in files in SQL's program directories. When we write a select statement and ask SQL for a specific row, it has to go find the row or set of rows and display them or send them back to the calling process.
This is no big deal when we ask for two rows on our local hard drive. However, some queries return thousands of rows. It's very important that SQL goes and gets the data in the most efficient manner possible. This is all accomplished by indexes.
Indexes are exactly what the sound like: a reference that points to somewhere else. There are two kinds of indexes that we can create in SQL server. We'll use the example of a book to illustrate the difference between the two.
A clustered index defines a way data is physically stored and organized. Think of it as the binding of a book. Chapters or page numbers physically define the layout of a book. The clustered index in SQL does the same thing. The data is actually stored in the order defined by the index. Just as a book can only have one binding order of pages, only one clustered index can be defined per table.
A non-clustered index is like an index in the back of a book. It can present the same data that is in the book in different orders. Its main purpose is to provide quick reference to help you quickly find a section or theme that might be somewhat buried in the physical order of pages.
In a book, you can have as many indexes in the back as you'd like. An index could be created that focus around a certain subject matter. A separate index could be organized with completely different criteria. They'd all still take you to the same pages in the book. You can have as many non-clustered indexes as you'd like in SQL server.
We'll get into created indexes in SQL in a future lesson. First, we're going to learn how indexes help us retrieve data.
Here is a sample table of data. It's fairly simple. It only has students and their age. We'll take a look at how SQL goes about retrieving different queries on this table and how indexes impact the performance.
Let's see we want to retrieve the row for Mike. We run the following statement:
SELECT * FROM STUDENT WHERE StudentID = 12
When there are no indexes available, SQL server just starts at the top of the list and starts looking around. The blue rows shown below show how many rows SQL had to “touch” before it found what it was looking for.
Since the data is just stored in a disorganized format, SQL had to flip through nine rows before it found Mike's row. It would have looked through 5,000 rows if it had to. This is obviously not very efficient and could make some queries take forever!
A clustered index puts everything in order. It also creates a sort of “road map” to help SQL server get to the right section of the data quickly. Let's assume we have our table indexed by the StudentID. This is a realistic scenario since the primary key often makes a great clustered index.
Here's a visualization of SQL's road map. When we ask it for row 12, it looks at this chart. The page levels tells SQL what area of the physical storage each record is in. In this case, record 12 resides in the DataD section. Here's the search process with this index in place.
As you can see, this worked better than when we had no indexes. SQL only touched three rows to find the one it needed. First, it knew to look in page section B since rows seven to twelve exist in that section. Then it knew that row twelve starts in data section D since it contained rows ten through twelve. Almost any record we asked for in this table could be retrieved in only with touching one to three rows! That's the power of an index!
Sometimes, there are certain queries that can benefit from an alternative index. For example, is the clustered index we looked at above going to help us if we're searching on student age? It won't help much since SQL will have to reorder the data to know things about who fits in what age group.
A non-clustered index gives us a way to virtually reorder data in certain criteria. In a book, you can look in the back and find a specific subject in the index. The page numbers in the index entry probably won't be right next to each other. That, however, is what an index is for. SQL's non-clustered index does the same thing.
SELECT * FROM STUDENTS WHERE AGE > 22
While this data is still organized physically by the clustered index that we saw above, SQL now has a helpful little index table at its disposal to help it grab ranges. When we ask for students over 22, SQL looks at the records in the following way:
Since age 22 falls within the 21-23 range of Index B, SQL started looking at that point. It had to “touch” Jim's row to check if it falls in the criteria. What happens next shows us just how great a non-clustered index is! Note the rows shown in green. SQL does not even have to touch them! It knows that anything in section C or D surely has an age over 22. While it costs a little bit of time and processing power to organize the rows in this way, it sure beats having SQL touch every row to see if it applies to the criteria!
Hopefully, this lesson has given you a whole new appreciation for how much is going on behind the scenes in the SQL server software. Now that we have this in order, we'll take a look at how we can create indexes on our existing tables and when it makes sense to do so.
Tainyan staff who teach SQL.