In the September update of Power BI, Microsoft released the preview version of a forecasting feature for a line chart.
Before we dive into Forecasting in Power BI, let’s quickly cover some points about around forecasting:
What is Forecasting? Simply put, is predicting or estimating (a future event or trend).
Seasonality: In any time series data, if we have any yearly predictable change or pattern then we have seasonality in the data set. In Power BI we can indicate seasonality in our data, 12 for yearly, 6 for half yearly and 4 for quarterly seasonality.
Confidence interval: It’s a probability defined in such way that actual values will lies with in this range, for example if we provide a 95% confidence interval, then we are saying that 95% is the probability of the actual value lying within the range.
Ignore Last: Certain data sets can have incomplete data for the last x months. We can that the last 1 or more months has incomplete data and should be ignored for prediction purposes.
Forecasting in Power BI
Data set requirement: To use Power BI forecasting, we need to have time series data, therefore we need a date column in the data set. For our example, I will be using a basic ticket counting data set. It consists of the count of ticket creation over the last 3 years.
Here’s the file:
Note: As Power BI forecasting is under preview, we first need to enable the forecasting feature. File > options and setting > options > preview features > tick Forecasting > click ok > restart Power BI desktop
The Forecasting in Power BI feature is only available for the line chart.
Plot a line chart with date column in the ‘Axis’ and the ticket count field in ‘Values’ section as illustrated below:
Steps for enabling the forecast: click Analytic tab > Forecasting > Add > Provide a forecast length > Provide a confidence interval > Provide Seasonality.
Testing and validating the Power BI forecast data
As illustrated above, we have enabled forecasting for the next six months. How do we check if Power BI is forecasting accurately? To do so, we split the ticket count data into two data sets:
data set 1: data up to Feb 2016
data set 2: data up to September 2016
Now we will do forecasting on data set 1 and validate the forecast data with data set 2.
Below is the line chart on data set 1 which has actual data up to Feb 2016. Taking the line chart up to September 2016, we will perform forecasting for the next 7 months.
On comparing this forecast with data set 2, the actual data.
From the above, we can see that for the month of June 2016 we have an actual ticket count of 3,455 and we have a forecast count of ticket 2,800 with an upper limit of 3,623 and a lower limit of 1,977.
Although this was a basic test, from this, it would be fair to deduce that the forecasting feature in Power BI is relatively accurate.
Below is a superimposed image of above two charts:
We can see that the actual trend is within the foresting limits.
Here’s a copy of the Power BI Desktop file:
Look out for more upcoming posts in our Power BI Tips series.
For more great posts see our blog page. Learn more about Power BI by attending our training program.
Thanks for the contribution! it’s really helpful.
Good article, thanks. Can you explain why PowerBi’s forecasting engine will produce a lower-bound forecast of a number that’s less than zero?
My time series data has never exhibited a <0 condition. I've set the min-Y axis to 0 and it still thinks that's some probability of a negative number event occurring in the future, which given my data, is impossible.
I adore assembling useful information, this post has got me even more info!
i’ve downloaded your .pbix file but i’ve got one question, in your unique line chart visual, count of ticket blue line indicator value is based on Data_feb16 dataset which is ending on February 2016 (csv exported file of embedded dataset downloaded from Export Data menu selected on visual menu confirm it).
But in this visual count of ticket blue line indicator value is finishing on September 2016 and not of February 2016 as expected considering initial source dataset filed value is Count of Ticket from Data_feb16 dataset.
As a matter of fact count of ticket values in line chart is also displayed for dataset ranging from March to September 2016 and are retrieved from data_sep16 dataset.
But in data model pane there is no relationships implemented between Data_feb16 & data_sep16 datasets, so how is it possible that visual blue line count of ticket indicator is gathering Count of Ticket indicator values from both datasets ?
Many thanks for your feedback on this matter.
In the sheet ‘Forecast_Validation’ of the .pbix file provided, you will see two charts. As expected, these two charts are reading from two separate datasets. The top chart ‘Count of Ticket by Month’ is reading the from the dataset called ‘data_sep16’. On the last tab, ‘Both charts Overlaid’, those same two charts from the ‘Forecast_Validation’ sheet are being displayed, one on top of the other. It might seem that one chart is reading from both disconnected datasets, but in fact, it is simply two charts overlayed appearing as one chart.
Hope that helps… (sorry if I misunderstood your question)