December 18, 2020
R and Power BI – Introduction and Examples
For Power BI version 2.76.5678.782 (December 2019).
R is a programming language for statistical analysis of data. It’s open source and supported with over 15 000 packages, covering specific areas. Alongside powerful packages it provides print quality visualization of data. Power BI supports R, opening door to unlimited data manipulation and visualization.
To use R in Power BI desktop you need a local installation of R. RStudio, a scripting environment is freely available and comes with latest version of R. R in Power BI is available in two instances: in Power Query queries and as a visualization in Power BI.
R in Power Query
We can use R inside a query. Incoming and resulting data is always a dataframe. Let’s look at a simple example.
We have a simple table.
We want to add a C column, sum of columns A and B. We take the following steps.
Select Transform > Run R script.
R editor titled Run R script is opened. Comment in first row states that incoming data is stored in a variable named dataset.
Enter the following rows.
newname = dataset
newname[‘C’] = newname[‘A’] + newname[‘B’]
First row creates a copy of dataset table named newname. We can’t change the original dataset variable, so we always make a duplicate.
Second row adds column C to newname table, sum of columns A and B.
Confirm with OK.
A list of all created tables is returned. In our case it’s only one table, newname.
Click on Table link. New table newname is opened.
R and Power BI
R can be used for visualization of data in Power BI report. That’s a big advantage, since R is strongly supported with graphical packages, providing building blocks for beautiful custom visualizations.
Let’s look at a simple example. We have time series of currency value that we want to visualize. Table consists of many currencies (AUS, EUR, USD, …), but we’ll only draw a plot for GBR.
We take the following steps.
We select R visualization in Power BI report..
R script editor is opened.
We check the fields we wish to plot. Our table consists of many columns, but we’re only interested in Year and GBR.
Script editor now gets input data and prints out the following comments.
Comment states that incoming data is stored in a dataframe called dataset and consists of columns Year and GBR. Duplicate rows were automatically removed.
We enter the following commands into script window.
main = “GBR curency”,
xlab = “Year”,
ylab = “GBR”,
type = “l”,
col = “blue”)
We used plot function with the following arguments:
- x axis, column Year from dataframe dataset
- y axis, column GBR from dataframe dataset
- Title »GBR currency«
- x axis name, »Year«
- y axis name, »GBR«
- line style »l«, meaning line
- color »blue«, meaning blue
We run the script by clicking Run Script icon.
We get the following plot.