June 15, 2020
5 best practices for data modeling
For Power BI Desktop version 2.76.5678.782 (January 2020).
Data model is the core of every Power BI project. Building a good data model is crucial. In this article, we present 5 best practices to make sure your data model runs smoothly.
Mark as date table
Calendar table is the most important table in a data model. There are many ways to build a calendar table. Our choice is using an M function in Power Query, presented in Creating Dynamic Calendar in Power Query Using an M Function.
After we create a calendar table, we need to mark it as such. We select the table and then Table Tools > Mark as date table. We get asked for the date field. In our case, we choose a field named Date.
Power BI checks if the field contains all the dates between the start and end date and confirms the validation.
Sort by other colum
Month field of the calendar table is just a text type data, which will by default be sorted according to alphabetical order. This is usually not helpful, we want months sorted chronologically in our reports.
We create another field with month number, for example Month No. We can do this in few different ways.
- We open Power Query with Transform Data, choose the Date field of Calendar table and Add Column > Date > Month. We rename field to Month No. We close Power Query with Close&Apply.
- We choose Calendar table in Power BI and click on Table Tools > New column. We enter the formula = MONTH(Calendar[Date]), where Calendar is the name of our calendar table and Date is the name of the date field.
We sort months chronologically by selecting the field with month names Month and then selecting Column Tools > Sort By Column > Month No. Similarly, we can sort days of the week.
Connecting tables to a star schema
When building a data model, we always pursue a Star schema, meaning one fact table and multiple dimension tables. Lets reiterate:
Fact table is a table, containing all the records we observe, for example a table of all the sales transactions. Fact tables are large and are growing daily.
Dimension tables are tables describing attributes like when, who, where and similar. For example, Products is a dimension table containing a all the products sold in a store. Dimension tables are small and rarely change.
Example: image below shows the fSales table, which contains thousands of rows. Each rows represents one transaction. Smaller dimension tables are connected to the fact table.
Best practice is to use an ID column in all dimension tables. For Product table, we would use Product ID. Same field Product ID would also be included in the fact table. We can then connect the fields: we select Product ID from the fact table and drag it to the same field in the dimension table. All we need to ensure is that data types of field match. The relationship is formed. Most relationships are of type Many to One. We usually want to avoid other relationship types.
We connect all other dimension tables in the same way. We get a shape of a star or a Star schema.
Hide connected fields
We hide the connected fields: right click on the field and Hide in report view. We hide the field in both fact and dimension tables. These fields usually contain numbers, which don’t carry any other information other than an index of an attribute. We’ll be using names in our filters, for example Bike BMX-C203 or California. When we select a product in report we consequently filter the product and the fact table. That’s the idea of a Star schema.
We usually go a step further and hide all the fields we don’t use. All that’s left visible in the fact table are measures.
Q&A is a visual that allows us to display data using field names and filters.
We can use synonyms for field names: we select a field name in our data model and enter synonyms in the Synonyms field. We can then call the same data using set synonyms.