Power BI Cleaning and Transforming Data
Pivot and unpivot columns in Power BI is very useful functionality for structuring data that cannot be used for reporting as-is. The first blog post on data preparation covered the basics of cleaning and transforming your datasets. This blog post, will go into more advanced cleaning and transformation steps focusing on transforming columns through the pivot and unpivot functions.
In short, to unpivot columns means turning multiple column headers to a single column where the column headers are captured in rows. Columns values are stored in another column. The Pivot column function does the inverse of Unpivoting columns. It turns row values into columns.
Pivot and Unpivot Columns
We will use a dataset called ‘SalesPersons‘ as our sample dataset to explain pivot and unpivot. Looking at the ‘SalesPersons’ data we can see the row and column values are not structured in a manner useful for modelling and analysis. In this example, each Sales Person is allocated a grade for each year. Each year is captured in a separate column. The current structure of the data does not allow for the data to be sliced by grade. The data table needs to be transformed so that there is only one column for the sales person’s grade and one column for the year.
Let’s start structuring the data. Select the little menu to the left of the first column header and promote the first row as headers.
From the data, we will keep the first two columns as this is our ID and Sales Person columns. We would like to unpivot our other columns so that we have a column for year and a column for the grade. Select the first two columns and right-click to choose ‘Unpivot Other Columns’.
The result is our original two columns and two additional columns. These columns are called Attribute and Value. The Attribute column refers to the headers (Year) of our columns before the data was Unpivoted and the Values column refers to the values of our columns before the data was Unpivoted.
Let’s rename the Unpivoted columns to Year and Grade.
We also want to use this data to create a table where we keep a unique record of our Sales People. Rename the query to ‘SalesPersonsGrade’ and duplicate the query. Rename the duplicated query to ‘SalesPerson’. This will be the unique Sales Person dimensions tables with no duplicates in the IDs.
For this table, we would like to have only our first two columns IS and Sales Person. Select the first two columns and right-click to select ‘Remove Other Columns’.
The result will be as shown below. A table that has unique records of each salesperson.
Here we saw that unpivoting columns turns multiple column headers to a single column where the column headers are captured in rows. Columns values are then stored in another column. The Pivot column function does the inverse of Unpivoting columns.
Let’s select the two columns that we would like to Pivot, Year and Grade. Select the Pivot function from the transform tab.
Once selected a screen will popup. Here you need to select the values columns (in blue) and the Aggregate Value Function (in red). The Values column is the column that we would like to make up our values, in this case, Grade. The Aggregate Value Function determines how your values will be aggregated. In this case, we would not like to aggregate our values.
The result is a table with the column ID, Sales Person and all the year columns that were pivoted.
The Pivot and Unpivot column function is a very simple yet powerful ability to transform your data to be ready for creating visualizations. Caution must always be given in using these functions. It is important to make sure that the right columns are used and that the data is correctly structured to be pivoted or unpivoted.