SoftOve

Database Tools for SQL Server

Introduction to indexes in SQL Server

The text below is a part of a presentation I held about Indexes in SQL Server. Hopefully this can be useful for someone who wants a short introduction or repetition of indexes.

Indexes help out to find the data needed without having to search through all data. In short it does this by keeping different ordered lists of the data and references to related data.

There are two types of indexes, clustered and non-clustered. The clustered index is on the table itself. The columns in the index decide the order of the rows stored on disk. You can think of a clustered index as a dictionary, just a long list of ordered data.

A non-clustered index is instead like an index in a book. Every row in the non-clustered index has a reference to the clustered index. Just as the keywords in the index of a book have a reference to which pages it can be found on.

How to choose indexes
There are a lot of rules and hints for choosing good indexes. I will mention some of them here:

o Analyse WHERE clauses and JOIN criteria used on the table. The columns referenced are your best candidates to be added to indexes.

o Avoid columns with wide types and keep the number of columns in the indexes low. A wide index will require more disk space, which also means more work for SQL Server to read the index. The chance that the index is available in the database cache will also decrease by its bigger size.

o An index on a column with few possible values will not benefit performance. However, it could still be useful in an index with serveral columns in it.

o Consider the order of the columns in an index. The selected order should be equivalent to how you would order the conditions in a WHERE clause. The columns/conditions with the highest uniqueness first, and secondly the column with the smallest data type.

o Consider the pros and cons when choosing between a clustered and a non-clustered index.

• You can only have one clustered index per table. A good rule is to create the clustered index first.
• Keep the clustered index as small as possible. This is because every non-clustered index on the table contains the clustered index, to use as a reference to the main table.
• Do not create a clustered index on a frequently updated column. When it is updated every non-clustered index need to be updated as well.
• If you have a query that references columns both within and not within a non-clustered index, SQL Server need to lookup these data via the clustered index. This lookup costs a lot in performance, so a general rule is to make sure a non-clustered index is highly selective. I.e. the number of rows returned must not be too many.
• As default SQL Server creates a clustered index on an incremental field.

More indexing techniques
There are a couple of more indexing techniques that can be used to increase the effectiveness of your indexes. I will mention three of them which I find most important to know about. They are all related to the issue with looking up data not available in a non-clustered index.

Covering Indexes
A non-clustered index having all columns needed by a query is called a covering index. By adding all columns used in both the SELECT list and the WHERE/JOIN conditions you can avoid the need for accessing the main table. Obviously using this technique can lead to huge indexes that are expensive to maintain.

Covering Index with INCLUDE
Another way to make a covering index is to use the INCLUDE operator introduced in SQL2005. Instead of adding all columns to the index you include them unsorted together with the reference to the main table.

Using this technique the columns in the WHERE/JOIN conditions are still good candidates for the non-clustered index. The columns in the SELECT list could instead be added with the INCLUDE operator.

Index Intersection
Instead of creating wide covering indexes one can create several non-clustered indexes on different columns. SQL Server can then merge the result of all indexes to avoid a lookup of data in the main table.

You can look at the execution plan of your query to find out which indexes SQL Server uses and if a lookup of data in the main table was done.

Anders Svensson

Thanks for sharing your notes. I really enjoyed your presentation btw.

Stephen Jones

Thanks, this made my day!

Introduction to Window Functions in T-SQL

With Window Functions you can do additional queries on the result of your main query, or even a part of the result. In one way they are very similar to aggregate functions but with the main difference that you make the calculation on the rows from your main query.

Using the same example and values as in my previous article Structured and recursive SQL using WITH I will show you a first example.

 /* Add some employees to use with the example. */
 DECLARE @emps TABLE (emp_number INT, emp_name NVARCHAR(100), emp_department VARCHAR(50), emp_salary INT)
 INSERT INTO @emps VALUES(1, 'Anders Andersson', 'Sales', 20000)
 INSERT INTO @emps VALUES(2, 'Bengt Bengtsson', 'Sales', 20000)
 INSERT INTO @emps VALUES(3, 'Claes Claesson', 'Sales', 21000)
 INSERT INTO @emps VALUES(4, 'David Davidsson', 'Development', 21000)
 INSERT INTO @emps VALUES(5, 'Erik Eriksson', 'Development', 23000)
 INSERT INTO @emps VALUES(6, 'Fredrik Fredriksson', 'Development', 24000)
 INSERT INTO @emps VALUES(7, 'Hans Hansson', 'Administration', 21000)
 INSERT INTO @emps VALUES(8, 'Gustav Gustavsson', 'Administration', 22000)
 INSERT INTO @emps VALUES(9, 'Ingvar Ingvarsson', 'Administration', 24000)
 
 
 /* Get each employee and their salaries compared to the averages. */
 SELECT 
 	emp_name
 	,emp_department
 	,emp_salary
 	,emp_salary - AVG(emp_salary) OVER (PARTITION BY emp_department) AS emp_salary_compared_to_department_average
 	,emp_salary - AVG(emp_salary) OVER () AS emp_salary_compared_to_company_average
 FROM
 	@emps


In the exemple above we calculated the average salary per department and for all rows in the result, in this case the whole company.

AVG(emp_salary) OVER (PARTITION BY emp_department)
AVG(emp_salary) OVER ()

Apart from the ordinary aggregate functions AVG(), SUM(), COUNT(), MIN() and MAX() there are a couple more. So far I have only used two of them, RANK() and ROW_NUMBER(). In the example below I show you the difference between these two.

 /* Get each employee and rank their salaries within their department and in the company. */
 SELECT 
 	emp_name
 	,emp_department
 	,emp_salary
 	,RANK() OVER (ORDER BY emp_salary DESC) AS emp_salary_company_rank
 	,ROW_NUMBER() OVER (ORDER BY emp_salary DESC) AS emp_salary_company_row_number
 	,RANK() OVER (PARTITION BY emp_department ORDER BY emp_salary DESC) AS emp_salary_department_rank
 	,ROW_NUMBER() OVER (PARTITION BY emp_department ORDER BY emp_salary DESC) AS emp_salary_department_row_number
 FROM
 	@emps
ORDER BY
	emp_salary_company_row_number


emp_nameemp_departmentemp_salaryemp_salary_company_rankemp_salary_company_row_numberemp_salary_department_rankemp_salary_department_row_number
Fredrik FredrikssonDevelopment240001111
Ingvar IngvarssonAdministration240001211
Erik ErikssonDevelopment230003322
Gustav GustavssonAdministration220004422
Hans HanssonAdministration210005533
Claes ClaessonSales210005611
David DavidssonDevelopment210005733
Anders AnderssonSales200008822
Bengt BengtssonSales200008923

In a future article I will show you some other examples of how you can use the Window Functions to query preceding or following rows. If you cannot wait until then I can recommend the following book by Itzik Ben-Gan, Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions.

David Jackson

Thanks! This is something every serious T-SQL developer should learn and use in their daily work.

Structured and recursive SQL using WITH

As until recently I had never seen or heard about the WITH clause. Since I believe there are many as me I will give some examples of how to use it. The WITH clause can be used for giving subqueries names and to reuse them later in your query. An easy example is given below where we have two subqueries. One for getting the average salary for each department and one for getting the average salary at the company.

/* Add some employees to use with the example. */
DECLARE @emps TABLE (emp_number INT, emp_name NVARCHAR(100), emp_department VARCHAR(50), emp_salary INT)
INSERT INTO @emps VALUES(1, 'Anders Andersson', 'Sales', 20000)
INSERT INTO @emps VALUES(2, 'Bengt Bengtsson', 'Sales', 20000)
INSERT INTO @emps VALUES(3, 'Claes Claesson', 'Sales', 21000)
INSERT INTO @emps VALUES(4, 'David Davidsson', 'Development', 21000)
INSERT INTO @emps VALUES(5, 'Erik Eriksson', 'Development', 23000)
INSERT INTO @emps VALUES(6, 'Fredrik Fredriksson', 'Development', 24000)
INSERT INTO @emps VALUES(7, 'Hans Hansson', 'Administration', 21000)
INSERT INTO @emps VALUES(8, 'Gustav Gustavsson', 'Administration', 22000)
INSERT INTO @emps VALUES(9, 'Ingvar Ingvarsson', 'Administration', 24000)

/* Define a subquery for getting salary average per department. */
;WITH department AS
(
	SELECT emp_department AS department_name, AVG(emp_salary) AS department_average 
	FROM @emps GROUP BY emp_department
)

/* Define a subquery for getting salary average in company. */
, company AS
(
	SELECT AVG(emp_salary) AS company_average 
	FROM @emps 
)		

/* Get each employee and their salaries compared to the averages. */
SELECT 
	emp_name
	,emp_department
	,emp_salary
	,emp_salary - department.department_average AS emp_salary_compared_to_department_average
	,emp_salary - company.company_average AS emp_salary_compared_to_company_average
FROM
	@emps

	INNER JOIN department ON emp_department = department_name
	CROSS JOIN company


Another usage of the WITH clause is to make recursive queries. In the first examples below we get each department above the department "London" in a company organization. In the second example we get each department below "London".

/* Add some department to use with the example. */
DECLARE @deps TABLE (dep_id INT, dep_name NVARCHAR(100), dep_parent INT)
INSERT INTO @deps VALUES(1, 'Europe', NULL)
INSERT INTO @deps VALUES(2, 'Asia', NULL)
INSERT INTO @deps VALUES(3, 'UK', 1)
INSERT INTO @deps VALUES(4, 'Germany', 1)
INSERT INTO @deps VALUES(5, 'China', 2)
INSERT INTO @deps VALUES(6, 'India', 2)
INSERT INTO @deps VALUES(7, 'London', 3)
INSERT INTO @deps VALUES(8, 'Newham', 7)
INSERT INTO @deps VALUES(9, 'Hackney', 7)
INSERT INTO @deps VALUES(10, 'Abbey Street', 8)

/* Define a recursive subquery to get all departments above London. */
;WITH above AS
(
	SELECT dep_id, dep_name, dep_parent 
	FROM @deps WHERE dep_name = 'London'
	
	UNION ALL
	
	SELECT p.dep_id, p.dep_name, p.dep_parent 
	FROM above AS c INNER JOIN @deps AS p ON c.dep_parent = p.dep_id
)
SELECT dep_id, dep_name, dep_parent FROM above

/* Define a recursive subquery to get all departments below London. */
;WITH below AS
(
	SELECT dep_id, dep_name, dep_parent 
	FROM @deps WHERE dep_name = 'London'
	
	UNION ALL
	
	SELECT c.dep_id, c.dep_name, c.dep_parent 
	FROM below AS p INNER JOIN @deps AS c ON c.dep_parent = p.dep_id
)
SELECT dep_id, dep_name, dep_parent FROM below

I should mention that there is an alternative, and maybe more recommended way of working with hierarchical structures in SQL Server. Read more about this in the article Model Your Data Hierarchies With SQL Server 2008.

Name Comment Send