Jul 21, 2017
The foundation for a Business Intelligence (BI) solution for your company is a data warehouse. In today’s blog post we will go through some high-level considerations that you should take into account when you are implementing a data warehousing project. This is a short summary of what we cover in detail in our course, Implementing a Data Warehouse with Microsoft SQL Server.
The data warehouse is a centralised repository of business data, which can be used for reporting and analytics, and that will support key business decisions. The very first step in the planning of a data warehouse is to find out what are the business questions that need to be answered. You should organise meetings with the key decision makers and arrive at a deeper understanding of the key business objectives and metrics used to drive the decision-making process.
Once you know the requirements for the reporting infrastructure and the analytics, you can make a list of the data that is required to support them. The business data is usually spread across many different systems throughout the company and you need to know how to obtain the data. You should assess the importance of each business question against the ability to answer it from existing data, and prioritise the business questions you will address in the data warehousing solution. You could classify the questions into four quadrants of a feasibility vs importance matrix:
| High importance, low feasibility
|| High importance, high feasibility
| Low importance, low feasibility
|| Low importance, high feasbility
The ones with high importance and high feasibility should be addressed first, and if there are a large number of them, you may consider breaking down the project into subprojects. The last quadrant to be addressed, if ever, should be the low importance, low feasibility.
Each subproject tackles the problem of implementing:
- A schema for the data warehouse database
- An Extract, Transform and Load (ETL) solution
- Data quality procedures
For the schema, you typically implement a star schema with one central Fact table surrounded by several Dimension tables. The fact table contains all the metrics required to answer the business questions such as sales amount, price, cost, discount, etc. It also contains keys to each one of the dimension tables representing business entities such as Customer, Employee, Product, Supplier, etc. The relationships between the Fact and the Dimension tables makes it possible to aggregate the metrics by attributes of each business entity. Therefore, we can generate sales amount by city of the customer, or average price per type of product, etc. Besides the business entities there will also be present a Time dimension with a suitable granularity such as the day or the hour, allowing us to aggregate by year, quarter, month, etc., down to the most granular measure.
The dimensional model will be constantly refined as the requirements gathering process evolves. Once we get some stability with the Dimensional model we start implementing the ETL components that will connect with the several data sources where the data is located in existing systems. The ETL will typically pass this data through a pipeline of transformations and finish by inserting the modified or cleansed data into the final destination that is the data warehouse.
If you would like to know more, click here to find out about our course, Implementing a Data Warehouse with Microsoft SQL Server, or contact us to speak with one of our Learning Solutions Advisors today.