Power BI Relationship Modelling – Star Schema
In our blog post called: Relationship Modelling – Part 1 we spoke about relationships in Power BI and what it entails. We also spoke about the importance thereof to really understand how to create relationships. This blog post will look at relationship modelling, the means of building your *data model* so that you can effectively slice and dice your data. This forms the base from where all visuals and filters are being built from. When relationship modelling is done poorly the result will be visuals that don’t perform as you would like, duplicate data, large datasets and poor performance of your data model.
This blog post will touch on data modelling and how the star schema is a great blueprint to follow when building relationships between tables. It ensures that there is a structure to your relationships and eliminates ambiguity. A well-designed star schema is essential to develop Power BI data models that are optimized for performance and usability.
Dimensions and facts tables
The star schema is made up of dimensions and facts tables. A well designed Star Schema will have on fact table that is linked to multiple dimensions tables. Let’s go into a bit more detail of what your dimension and facts table entails.
Your dimensions table describe the business entities of your model. Dimension provides the context surrounding a business process event. To put it simply, they give the who, where and what of a fact. Let’s take the sales business process as an example. The dimensions for quarterly sales would be:
- Who – Customer Names
- Where – Location
- What – Product Name
Here we can see that a dimension is a window to view the information of the facts.
Your facts tables give more information to the measurements/metrics or facts from your business process. From our sales business process, the measurement would be sales orders, stock balances etc. Thus the fact tables store observations or events. The fact table is the primary table in a dimensional model and is made up of:
- dimension key columns that relate to dimension tables, and
- numeric measure columns
The dimension key columns refer to the dimensionality of a fact table, and in turn, the dimension key values indicate the granularity of a fact table. Your dimensions tables will have a small number of rows where your facts tables will contain large datasets.
Star schema relevance to Power BI models
Dimensions and facts make up a star schema. Your facts table is in the middle and you dimension tables ‘explains’ the facts table. The star schema has no loops and no relationships that complete a loop between three or more tables. If you have a loop it will create cardinality within your relationships. The star schema ensures that all your tables can function as one big table.
The accuracy in creating your Dimensional modelling determines the success of your data warehouse implementation. Here are the steps to create Dimension Model
- Identify Business Process
- Identify Grain (level of detail)
- Identify Dimensions
- Identify Facts
- Build Star schema
Queries import data into Power BI and your data model is used to filter, group, and summarize your data. When a data model is designed well it provides tables for filtering and grouping and summarization. In the star schema, the dimension tables support filtering and grouping and the fact tables support summarization.
Step by step guide to setup a Star Schema
Our Star schema is made up of our facts and dimensions tables. What is essential here is how we manage the relationships between these table through good practice data modelling in both Facts and Dimensions tables. Let’s dive a bit deeper into this now that we have a good understanding of what a Star schema is and why it is important.
Let’s see our below table where we have one big table where our dimensions and facts are still mixed in one table. The table contains all of our sales data and the information around the sales data. We have broken down the table into four colours. The grey columns will form our facts table as it contains data around our sales transaction. The data in yellow refers to our customer data. This will be our first dimension table. In the facts table that we are developing now, we would only like to see the ‘CUSTOMERCODE’ column. The ‘CUSTOMERNAME’ and ‘CUSTOMERGROUPNAME’ describes our customer. What you will note here is that if we keep the ‘CUSTOMERNAME’ and ‘CUSTOMERGROUPNAME’ it will make our facts table larger than necessary. Remember as stated earlier in the blog post, our facts table is a large data set containing all our event data and our dimensions tables are small containing information surrounding the context of a business process event. If we don’t exclude these two columns the data set becomes unnecessarily large. You rather want to keep unique records of your customers in a separate dimension table and link the facts and dimensions table together.
Create Dimensions Table
Let’s go ahead now and create our three dimensions tables called ‘Customers’, ‘Products’ and ‘SalesPerson’. We have imported this data into Power BI and called it ‘Sales Data’. To create our dimensions tables we will simply duplicate the ‘Sales Data’ table. We will explain the Customers Dimensions table in detail.
Rename the duplicated table to ‘Customers’. Select the three columns called ‘CUSTOMERCODE’, ‘CUSTOMERNAME’ ‘CUSTOMERGROUPNAME’ and select remove other columns.
What remains is a table with our three columns. We need to have a unique record of our customers. Let’s remove all duplicates from our ‘CUSTOMERCODE’ column.
The data that remains is a unique set of our customers.
We followed the same process as above for the ‘Products’ and ‘SalesPerson’ table.
Each dimension table should contain a key column. This column should be unique per each row in the dimension table. This column will be the primary key of this table and will be used in the fact table as a relationship.
We are now moving on to our Facts Table.
We need to remove the unwanted columns in our facts table. These are the columns that are contained in our dimensions table. We do , however, need to keep a Unique Key for each dimensions table in the facts table. As mentioned when developing the dimensions table, this column is used to link the facts and dimensions table. The three columns that we are keeping is ‘CUSTOMERCODE’, ‘ITEMCODE’ and ‘SALESPERSONID’.
What remains is the table as shown below. We have our sales data and the three unique columns that will form the link between our facts and dimensions tables.
Let’s have a look at our data model.
What we notice here is that our Star Schema was set up perfectly. We have one-to-many relationships between each dimension table and our facts table. The filtering direction is also from our dimension to our facts table. The filtering direction shows us that we will filter our facts table from our dimensions tables.
Top tips for our Star Schema
Dimensions tables are unique
Your dimensions table will be a set of unique records for each row. No duplicates are allowed in these tables.
Facts tables should contain all records
The facts table should contain all of your original data. In contrast to the dimensions table, the facts table may contain multiple records.
The facts and dimensions tables are connected through unique keys. Make sure that these keys are logical and as far as possible numeric values.
When the tables are set up correctly the relationship between your facts and dimension table will be a one-to-many relationship. The one side will be the dimension table and the many side will be the facts table.
The relationship direction will flow from the dimension to the facts table.
Multiple Facts tables
The dimensions and facts table is not set by a property in Power BI, rather determined by the data modeller and how the data relationships. The relationships of your model set the filter propagation path between two tables as well as the Cardinality property of the relationship which determines your table type. There can be a “one-to-many” or a many-to-one” relationship between your tables. The dimensions table is always on one side of the relationship and your facts table is always on the many side of your relationship.