Power BI Merging and Appending Queries
Let’s start off by explaining what merging and appending queries in Power BI entails and the difference between the two. In short, when you have one or more columns that you’d like to add to another query, you merge the queries. When you have additional rows of data that you’d like to add to an existing query, you append the query.
We will use practical examples to explain merging and appending queries. Let’s move on to another data sets where our Customers data is stored. Here we have two data sets, one for Customers and one for Customer Location.
Our customer table is as follow:
Our customer location table is as follow:
We need to combine these two datasets so that we have each customer’s location (i.e. postal code) together with the remainder of the customer-related data in one place. In order to do this, we’ll use the Power BI Desktop Merge function. Let’s see how this is done.
To perform merging we require a unique ID in each table by which we can merge the two tables. Power BI will use these unique ID’s to match the data in one table with the corresponding data in another table. The result one table containing the data from our to tables.
Inspecting the two tables, we can see that both tables have a unique ID in columns CUSTOMERCODE or CustomerCode, which we will use to merge these two tables together.
With the Customers table selected, go to the Home tab, then the Combine Group, and click on the downward arrow to the right of the Merge Queries button. A drop down containing two options will appear, Merge Queries and Merge Queries as New. For this exercise let’s click on the Merge Queries as New button, which will bring up the Merge function pop-up box.
From the Merge function, we can see Customers table on the top but no table at the bottom. Select Customer Location from the drop-down list of the bottom table. Select the CUSTOMERCODE column of the Customers table and the CustomerCode column of the Customer Location table that will serve as the necessary link to merge these two tables together.
Notice in the figure above the information circled in blue provide information on how many of the rows from the Customers table have resulted in an accurate match to the records from the Customer Location table. In this case, all 14 rows from the Customers table have a match from the Customer Location table. You can also select other join-types from the drop-down list, but for this exercise, we will use the Left Outer join type. Click the yellow Ok button to complete the merge.
Once the merge function is complete you will see the Customer Location table shows as a column within the Customers table with row items called Table.
Let’s expand this merged dataset by clicking on the outward pointing arrows to give us the data we need. We now see all the columns contained in the Customer Location table, but we only require the Postcode, so uncheck all other columns except for PostCode. Uncheck ‘Use original column names as prefix’ to ensure we have a short and clean column name. Click the yellow Ok button to complete the expansion of the table data.
We will see the result of the table below. As you can see the data highlighted in red is the data from our ‘Customers’ table and the data highlighted in blue was merged from our ‘Customer Locations’ table.
Let’s rename our merged query to Customer. Right-click on the Customers table and click on Enable load to disable the table from loading to the model. Repeat the process for the Customer Location table. The reason for this is to reduce the load on our data model in order to perform optimally. We need to keep the memory being used as small as possible. Every query that is loaded into your Power BI model consumes memory.
When we disable the query load it does not mean that our query will not refresh. The diable load functionality disables loading the query into memory. In the above example, the Customers and Customer Location table will still refresh when a manual or scheduled refresh occurs although it is marked as disable load. The Customer table that is dependant on these two queries will use the data as an intermediate source to load the data into the Customer table. This step becomes all the more important when your Power BI data model expands.
It is good to understand this dependancy. The outline of your data model is found under the View tab when you select Query Dependencies. Here you see that the Customers and Customer Location query is still in your data model and the Customer query is dependant on those two queries.
By appending queries two or more datasets will be combined as one. After appending two queries, rows will appear after another from the appended datasets. For instance, if one table has 30 rows and another 20, the appended results would have 50 rows.
Let’s see our two data sets that we will use the explain Append. Firstly, we have our ‘Sales (2016 – 2017)’ table:
Secondly, we have our ‘2018 – Monthly’ table:
We would like to have all our sales data from the ‘2018 – Monthly’ and the ‘Sales (2016 – 2017)’ tables in one combined table called ‘Sales Data’. To achieve this we will append these two tables. Select ‘Sales (2016-2017)’ from the query list. From the Home tab and the Combine group in the ribbon, select the drop-down arrow next to Append Queries and select Append Queries as New. We select ‘Sales (2016-2017)’ as the primary table to append our data. Because ‘Sales (2016-2017)’ was selected in step 1, this field will by default already contain the ‘Sales (2016-2017)’ data. From the drop-down list, select the ‘2018 – Monthly’ data as the table to append to the primary table. Select OK to append the tables.
In the query list, we see the newly appended data called Append1.
As stated in our first blog post, the naming convention is essential to maintain. Let’s rename our query to Sales_Transactions by selecting the newly appended data and changing the name to Sales_Transactions.
Let’s keep to good practice and disable the two tables that won’t be used from loading to memory. Right-click on the ‘Sales (2016 – 2017)’ table and uncheck the Enable load option to disable the table from loading to the model. Repeat the process for the ‘2018 – Monthly’ table. These tables will still refresh from the data source and will update the appended table Sales_Transactions whenever a refresh occurs.
Formatting Appended Table Columns
With the S_Transactions appended, take note of the format for each column as it may not have been formatted correctly by the query editor.
Whenever importing data it is essential to review the formatting as explained in our previous blog post. For example, columns which are in a date format needs to be formatted to date values, columns which are meant to be decimal values, needs to be formatted to decimal values.
Duplication of data when appending tables
When appending two or more tables, the append function will not remove duplicates this needs to be checked and done as outlined in our previous blog post.
How the append function handle columns
When two tables have the same number of columns with the same name, append will simply match these columns and add the rows of these columns underneath each other. However, when columns are not the same append handles it a bit different. Let’s say table one has columns called col1, col2, col3.
Table two has columns called col4 and col5.
When these tables are appended the result will be a table with col1 to col5. If one of your original tables does not have a column from another table the append query will simply fill the values as null.
merging and appending queries