Indexes

Posted: March 10, 2007 in Uncategorized

What are Indexes?
Indexes are database objects that help for fast retrieval of data and its integrity. The database finds the shortest path to the requested data by the help of indexes.
Why are indexes created?

Indexes are created to fetch information fast. They are pointers to the information.

Indexes should be created keeping in mind the most appropriate or frequently fetched column.

Create index IndexName
On Table (column name)

SQL Server stores database information in three files that are visible even from the windows explorer. These files are called the : primary data file (the extension is “.*.mdf”), secondary data file (the extension is “ndf”), and log file (the extension is “.ldf”).
SQL Server stores data in the base units called pages, each of which can store 8K of data. There is a maximum of 8060 bytes can be used for storing data in each page. This accounts to the maximum row size constraint. The maximum row size available in SQL Server is 8060 bytes, because that is the maximum that can be stored in a page. The pages are then combined in to extents, which are a combination of 8 pages. There are special pages, called Index Allocation Maps, which stores information about which extents are used for a particular table.
The information about SQL Server indexes is stored in sysindexes system table. It also contains information about how to start reading actual data. If you have defined a clustered index in a table, the whole data is stored in ordered format, and the Indexes can be used to exactly identify the storage location. This means SQL Server doesn’t have to do a complete table scan to find the exact data that was requested for. This is an optimized path, which helps to access data and fetch it to the user. Since there is a pointer that actually points to the data, queries can be executed faster.
Types of indexes:
Indexes are of two types. Clustered indexes and non-clustered indexes.
Clustered indexes:
When you create a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table.
The pages in the data chain and the rows in them are ordered on the value of the clustered index key. All inserts are made at the point the key value in the inserted row fits in the ordering sequence.
Indexes are organized as B-trees. Each page in an index holds a page header followed by index rows. Each index row contains a key value and a pointer to either a lower-level page or a data row. Each page in an index is called an index node. The top node of the B-tree is called the root node. The bottom layer of nodes in the index is called the leaf nodes. The pages in each level of the index are linked together in a doubly-linked list. In a clustered index, the data pages make up the leaf nodes. Any index levels between the root and the leaves are collectively known as intermediate levels.
SQL Server navigates down the index to find the row corresponding to a clustered index key. To find a range of keys, SQL Server navigates through the index to find the starting key value in the range, and then scans through the data pages using the previous or next pointers.
The average size of a clustered index is about five percent of the table size. However clustered index size varies depending on the size of the indexed column.
During index creation, the SQL server temporarily uses disk space from the current database. A clustered index requires 1.2 times the table size for working space when the index is created. The disk space that is used during index creation is reclaimed automatically after the index is created.
Non-clustered Indexes
Nonclustered indexes have the same B-tree structure as clustered indexes, with two significant differences:
The data rows are not sorted and stored in order based on their nonclustered keys.
The leaf layer of a nonclustered index does not consist of the data pages.
Instead, the leaf nodes contain index rows. Each index row contains the nonclustered key value and one or more row locators that point to the data row (or rows if the index is not unique) having the key value.
If the table is a heap (does not have a clustered index), the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The entire pointer is known as a Row ID.
If the table does have a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 2000 makes duplicate keys unique by adding an internally generated value. This value is not visible to users; it is used to make the key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.
Because nonclustered indexes store clustered index keys as their row locators, it is important to keep clustered index keys as small as possible. Do not choose large columns as the keys to clustered indexes if a table also has nonclustered indexes.
If the table does not have a clustered index, the row locator will be combination of fileno, pageno and the no of rows in a page.
If the table does have clustered index, the row location will be clustered index key value.
Non-clustered indexes are particularly handy when we want to return a single row from a table.
The order of the leaf pages of a non clustered index differs from the physical order of the table. The leaf level pages are sorted in ascending order. There are up to 249 non-clustered indexes.
Disadvantages:

If you create an index on each column of a table, it improves the query performance, as the query optimizer can choose from all the existing indexes to come up with an efficient execution plan. At the same time, data modification operations (such as INSERT, UPDATE, DELETE) will become slow, as every time data changes in the table, all the indexes need to be updated. Another disadvantage is that, indexes need disk space, the more indexes you have, more disk space is used.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s