Insight into the database structure internals

 Jan 16, 2017

In today’s blog I want to write about what actually happens internally to a database once you start working with the data and how it is stored behind the scenes. The main focus will be on how data is stored and the page allocations alongside it. What we see when we use a database is multiple files that are saved on the file system and that store s the data for our database, but what happens internally is way more than just a simple file or files on disk.

Firstly, you have to understand that SQL stores data by using tables and views or that is how it stores it logically. Data is physically stored in one or more data files, within these data files we get extents and then within an extent we get pages.

By default, a database only has one data file (.mdf extension) and one transaction log (.ldf extension) associated with it. This data file contains information about all the files it needs to use and metadata needed by SQL, as well as any user defined data (tables and views) that you want to add. You don’t have to rely just on the one data file, you can additionally create more data files (.ndf extension). The creation of more data files is a common requirement to improve overall performance.

Within these data files we get extents, which acts like containers for your pages. Extents consists of eight physically continuous pages of data, for a total of 64KB per extent. These extents have two types:

  • Mixed Extents: These are created when a new object is defined, the first data page of an object will always be stored in a mixed extent. Mixed extents store data pages that belong to different objects. Pages are stored in mixed extents until it grows large enough that it takes more than 8 pages, it then switches to use a uniform extent.
  • Uniform Extents: These extents are owned by a single object. For example, if you create an index on a table and it has enough information to require more than eight pages it will be belong to a uniform extent.

As mentioned previously, each extent contains multiple pages. These pages are the most basic unit of storage that SQL uses, each page is only 8KB in size and consists of the following parts:

  • Page Header: The header of a page is always 96 bytes in size and contains the metadata of the page. This metadata includes Page ID, Page types, number of records, free space and so forth.
  • Data Storage Space: This section contains the actual data and is written directly after the header.
  • Row Offset Table: This part contains a two-byte pointer for each row stored on the page. This is needed because the rows stored on a page isn’t always in logical sequence, by having the row offset table we can keep a logical order for the rows of data stored on the page.

We also have to be aware that there are multiple different types of pages not just data pages. Some of the different page types we get is as follow (not all of them are listed):

  • Index Pages: Stores index information for clustered and non-clustered indexes
  • GAM: Global Allocation Map records allocation for extents
  • SGAM: Shared Global Allocation Map records allocation for mixed extents
  • IAM: Index Allocation Map stores information about what extents are used per table or index

Let’s go a little deeper into what happens when data is added to an object. When we have a table, we perceive that as one object in our database. However, a table consists of more components that work internally. When a table is created, a partition is created alongside it and initially this represents the full table. Later on, you can add additional partitioning to your table if you want to split the data of the table due to size or performance.

This partition or partitions will then contain the use the pages mentioned earlier in the blog. If we start adding columns to our table, we will see that specific allocation units are used depending on the data types assigned to the column. These allocation units will group pages together within a table to manage the data based on different data types, we have the following types available:

  • IN_ROW_DATA: This contains all data types except for large objects such as images and xml. Every table partition will have one in_row_data allocation page assigned to it. When the size limit of 8KB is reached, the column on the data page is moved to a row_overflow_data allocation unit and the original page is updated with a pointer.
  • ROW_OVERFLOW_DATA: This allocation unit contains pages that contains rows with variable length columns that exceeded the 8KB row size limit within an in_row_data unit.
  • LOB_DATA: This contains a collection of large objects such as images and xml that uses a lot more than just 8KB.

To conclude this blog, you should know that all these allocation units are mapped to extents. The pages that keeps track of the mapping is called an IAM page. We will have an IAM page for each partition in the table and also for every different type of allocation unit assigned to the table. I hope this has given you a bit more insight into the internal working of data storage in SQL.

For more information, take a look at our SQL Server training courses.


How do your Excel skills stack up?   

Test Now  

About the Author:

Auret Swanepoel  

As a recent addition to the New Horizons team, Auret is a highly skilled and qualified IT Technical trainer. He has been a Microsoft Certified Trainer (MCT) since 2008 and has since then, also become a Microsoft Certified Professional (MCP), a Microsoft Certified Technology Specialist (MCTS) and a Microsoft Certified Information Technology Professional (MCITP). With his international experience as a trainer in South Africa, Auret is able to adapt his teaching style to different audiences in the classroom and ensure that students are learning in a positive and collaborative environment.

Read full bio
top