Stepwise Guide to Personalize Visuals in Microsoft Power BI (Preview)

maxresdefault1

Introduction

Personalize Visuals (preview) feature was released with the April 2020 release of Microsoft Power BI. In this blog post, I will be briefing you through the personalize visuals feature along with a stepwise guide to implement personalize visuals in your Power BI reports.

Purpose of Personalize Visuals

Power BI end users can have different requirements when it comes to reports. Earlier these requirements were catered by providing edit permissions for the end users or by the end users taking a copy of the report maintained as a separate version. With personalize visuals, the Power BI end users are provided the ability to edit the Power BI reports as per their specific requirements without edit permissions or without having versions of the existing reports.

Stepwise Guide to Personalize Visuals

For demonstration purposes, I created a simple Sales report in Power BI, so that it would be much easier to understand the process involved in using the personalize visuals feature.

The data used for this report can be found here:

https://docs.microsoft.com/en-us/power-bi/sample-financial-download

Note: Dimensional modeling concepts are not followed when creating the report since it is focused on demonstrating the ‘personalize visuals’ feature only.

personalize_visuals_1

Enabling Personalize Visuals Feature    

Firstly, it is necessary to enable personalize visuals as a preview feature in Power BI Desktop. Open Power BI Desktop and navigate to Options and Settings > Options > Preview Features. The user will be directed into the window shown below.

personalize_visuals_2

Tick on the checkbox, ‘personalize visuals’ shown below to enable personalize visuals as a preview feature in Power BI. Click on the button ‘OK’ to proceed further.

personalize_visuals_30

It is necessary to enable the personalize visuals feature in the Power BI report level (Power BI file level) as well. For this, open the specific Power BI report (i.e. the Annual Sales Report file in my case) and navigate to Options and Settings > Options > Report Settings under current file. The user will be directed into the window shown below.

personalize_visuals_4

Tick on the checkbox, ‘allow report readers to personalize visuals to suit their needs’ under personalize visuals. This enables the personalize visuals feature in Power BI for the specific Power BI file. Click on the button ‘OK’ to proceed further.

personalize_visuals_5

When the personalize visuals feature in Power BI is enabled for the specific Power BI file, the icon shown below should be visible in each visual in the report.

personalize_visuals_6

Disabling Personalize Visuals Feature in Visuals

When there is a requirement to disable the personalize visuals feature in a specific visual, select the visual and navigate to ‘visual header’ under format option in the visualization pane. Turn off the ‘personalize visual’ option in the bottom under visual header. This will disable the personalize visuals feature in Power BI for the specific visual. At this stage there will be no personalize visuals icon visible in the visual.

personalize_visuals_7

Personalize Visuals in Action

It is necessary to publish the specific Power BI report (Power BI file) with personalize visuals feature enabled to a workspace in the Power BI service.  In my case, I published the Annual Sales report into my workspace in the Power BI service.

personalize_visuals_8

Hover a specific visual and click on the personalize visual icon in the visual header of that visual as shown below.

personalize_visuals_9

When clicked, the visual will be focused and a side panel will pop up to personalize the specific visual as shown below. This includes personalization by visualization type, axis, legend, value and tooltips.

personalize_visuals_10

When clicked on the visualization type, the visualization type of the report can be changed for the specific visual.

personalize_visuals_11

In my case, the visualization type I had used for the sales breakdown by product visual is a clustered bar chart. I will change it for a stacked column chart as shown below.

personalize_visuals_25

When clicked on the axis, the axis can be added or changed to any field in the data model.

personalize_visuals_26

In my case, the axis I had used for the sales breakdown by product visual is product. I will change it to country as shown below. A restriction that I noticed here is that the title of the visual cannot be changed as with the content of the visual.

personalize_visuals_27

When clicked on the legend, the legend field can be added or changed to any field in the data model.

personalize_visuals_28

In my case, I had not used a legend for this Sales Breakdown by Country visual. I will add the field segment as the legend.

personalize_visuals_16

When clicked on the value, the value field can be added or changed to any field in the data model.

personalize_visuals_17

The aggregation of the value field can be changed by clicking on the ellipsis in the value field.

personalize_visuals_29

In my case, the value field that I had used is Sales. I will not do any change for the value field.

When clicked on the tooltips, tooltips fields can be added or changed to any field in the data model.

personalize_visuals_18

In my case, I had not used any tooltip field for this Sales Breakdown by Country visual. I will add the field Sales as the tooltip.

personalize_visuals_19

 

Reverting Changes with Personalize Visuals

Click on the erase icon in the personalize visuals side panel to revert changes made with the personalize visuals feature.

personalize_visuals_21

Saving Personalized Visuals

It is necessary to add the edited report with the personalize visuals feature as a personal bookmark to save the changes.

Click on the ‘Bookmarks’ option in the top right-hand corner of the report opened in Power BI service and click on ‘Add a personal bookmark’.

personalize_visuals_22

When clicked on ‘Add a personal bookmark’, the below window will be prompted in the top right-hand corner. Provide an appropriate name for the bookmark and tick on the checkbox ‘Make default view’ it is required to see the specific view with personalize visuals as the default view. Click on ‘Save’ to save the bookmark.

personalize_visuals_23

The bookmark will be saved under ‘Bookmarks’ in the Power BI service as shown below.

personalize_visuals_24

Conclusion

Personalize visuals feature is a useful feature specifically for Power BI end users. The only aspect which needs improvement is the visual title. Under personalize visuals the end user should be given the option of editing the visual title. Other than that, it is a brilliant feature to have.

Analytics with Azure Data Lake Gen 2 in Microsoft Power BI

maxresdefault1

Introduction

Few months ago, there was a requirement to connect Azure Data Lake Gen 2 as a data source with Microsoft Power BI. At that stage since there was no Power BI connector to Azure Data Lake Gen 2, I worked hours on research and development to accomplish this task. Unfortunately, an appropriate solution was not discovered. As a result, the data source was migrated from Azure Data Lake Gen 2 to Azure Data Lake Gen 1 and the Azure Data Lake Gen 1 Power BI connector was used for analytics. Recently, in the September 2019 Microsoft Power BI update, a beta version of the Azure Data Lake Gen 2 connector was released. In this blog post I will be covering the basics of a data lake and how to connect Power BI with this Azure Data Lake Gen 2 connector.

What is a Data Lake?

It is important understand the definition of a data lake before proceeding any further. A data lake is a central storage location to store structured, semi-structured and unstructured data from different data sources. For example, structured data will be loaded from relational databases, semi-structured data such as JSON files will be loaded from IoT devices and unstructured data such as videos will be loaded from video publishing platforms.  At this stage the workload can be based on batch processing or stream processing. The simplest way to describe a data lake is with comparison to a natural lake. When it comes to a natural lake, water flows in and out from rivers in different locations. It is the same with a data lake. The data flows in and out of the data lake from sources in different locations.

Story of Azure Data Lakes

At first, Azure released an object storage solution called Azure Blob Storage. Specifically, Azure Blob Storage was highly optimized for storing a large amount of unstructured data. These data were stored in containers as blobs in the Blob storage account. Key features of Azure Blob Storage are tired storage and disaster recovery capabilities. Azure Blob Storage was used for streaming video/audio, storing files to be accessed by several users and much more.

Later, Azure released an optimized storage for big data analytics called Azure Data Lake Storage Gen 1. At this stage any type of data which are structured, semi-structured or unstructured can be stored unlike Azure Blob Storage. These data were stored as files inside folders based on a hierarchical file structure. Key features of Azure Data Lake Storage Gen 1 are file system semantics and file level security. Azure Data Lake Gen 1 was used for batch analytics, interactive analytics, streaming analytics and much more.

Recently Azure released Azure Data Lake Gen 2 which is a combination of Azure Blob Storage and Azure Data Lake Gen 1. In other words, Azure Data Lake Gen 2 includes the features of Azure Blob Storage and Azure Data Lake Gen 1. Currently, Microsoft recommends using Azure Data Lake Gen 2 when it comes to data lake/modern data warehousing solutions .

Purpose of Connecting Azure Data Lake Gen 2 with Microsoft Power BI

Structured, semi-structured and unstructured data from any source can be stored in Azure Data Lake Gen 2. With the Azure Data Lake Gen 2 connector in Power BI, these data can be loaded into Power BI and a data model can be developed by data cleansing/transformations. The developed data model can be used to cater various business requirements with reports and dashboards. This can be a less costly implementation rather than investing on a long-term conceptual Azure modern data warehouse architecture with high end Azure cloud data platform components such as Azure Databricks . However, such an implementation cannot cater big data requirements fully. Under such circumstances a long-term modern data warehouse architecture will be required.

Data Scientists access Azure Data Lake Gen 2 and extract the necessary raw data for advanced analytics scenarios. With the Azure Data Lake Gen 2 connector in Power BI, Data Scientists can connect Azure Data Lake Gen 2 with Microsoft Power BI, load the required raw data and work on tasks related to advanced analytics such as exploratory data analysis in Power BI efficiently.

Data Analysts access Azure Data Lake Gen 2 and extract the necessary raw data to generate meaningful business insights in different aspects. Under such circumstances, a business intelligence tool such as Microsoft Power BI is critically required for Data Analysts. As with Data Scientists, Data Analysts can connect Azure Data Lake Gen 2 with Microsoft Power BI, load the required raw data and proceed with their day to day work much more efficiently.

Furthermore, it is appropriate to mention that Azure Data Lake Gen 2 connector in Microsoft Power BI caters particular business requirements and the majority of Azure Data Lake Gen 2 users.

Stepwise Implementation of Connecting Azure Data Lake Gen 2 with Power BI

Click on the ‘Get Data’ option in Microsoft Power Bi and select ‘More’ to navigate the list of Power BI connectors available.

azure_data_lake_power_bi_1

A window will be prompted demonstrating all the available connectors in Microsoft Power BI.

azure_data_lake_power_bi_2

Navigate to the ‘Azure’ section and select ‘Azure Data Lake Gen 2 (Beta)’ which is the Power BI connector for Azure Data Lake Gen 2.  Click on ‘Connect’ to configure the connection.

azure_data_lake_power_bi_3

When ‘Connect’ is clicked, the user will be directed into the below window warning the user that the Azure Data Lake Gen 2 connector is a beta version. Click on ‘Continue’ to proceed further.

azure_data_lake_power_bi_4

A window will be prompted requesting the end point URL of the Azure Data Lake Gen 2 folder or file in which the user should connect.

azure_data_lake_power_bi_15

Navigate to the location in which the required folder or file is available in Azure Data Lake Gen 2 with the Azure Storage Explorer. Right click on the folder or file and select ‘Properties’. In this a single file inside a single folder is used as the data source in Azure Data Lake Gen 2 for demonstration purposes.

azure_data_lake_power_bi_5

The user will be directed into the properties window displaying the end point URL of the folder or file.

Copy the URL displayed in the window and paste in the Azure Data Lake Gen 2 connection configuration window in Microsoft Power BI. Click on ‘OK’ to proceed further.

azure_data_lake_power_bi_6

A preview version of the files available in the Azure Data Lake folder are shown since the URL is directed to a folder. In the bottom of the window, there are 3 options provided which are load, transform data and cancel. Load option is used to load the data directly into Power BI without any cleansing or transformations. Transform Data option is used for data cleansing and transformations before loading the data for reporting purposes. Cancel option is used to cancel the process of loading data from Azure Data Lake Gen 2. At this stage the option ‘Transform Data’ option will be used since the data should be cleansed and transformed into a data model.

azure_data_lake_power_bi_7

When the ‘Transform Data’ option is clicked, the user will be directed into the power query window.

azure_data_lake_power_bi_8

At this stage the column ‘Content’ consists all the necessary data in the folder files. Rest of the columns are understood to be unnecessary. Right click on the column ‘Content’ and select ‘Remove Other Columns’ to remove the unnecessary columns.

azure_data_lake_power_bi_9

Click on the ‘Combine Files’ icon in the right side of the column ‘Content’ to combine the content in all the folder files to a single dataset as a table. Note that this step can be accomplished successfully only if all the files in the folder follows the same data structure.

azure_data_lake_power_bi_10

When the ‘Combine Files’ icon is clicked, a window will be prompted with the settings related to the process of combining each file. Click on ‘OK’ to proceed further since there are no modifications required to the settings.

azure_data_lake_power_bi_11

The single dataset will be visible as a table in the power query window with all the combined files.

azure_data_lake_power_bi_12

Rename the table to an appropriate user-friendly name in the Properties section of the Query Settings.

azure_data_lake_power_bi_13

Click on the ‘Close & Apply’ option in the top left-hand corner of the window. This will load the data model into the standard Power BI views which are report view, tabular view and model view.

azure_data_lake_power_bi_14

Convert the table into a Power BI tabular data model by referring my previous blog post ‘Converting Flat Files to Microsoft Power BI Tabular Data Models’.

Below is the link to the post:

https://thimantha.wordpress.com/2019/09/03/converting-flat-files-to-microsoft-power-bi-tabular-data-models/

Conclusion

Azure Data Lake Gen 2 will be one of the most commonly used data sources in Microsoft Power BI when the final version of the Power BI connector is released. The purpose of this blog post is to walk you through the basics of a data lake and the new Power BI connector for Azure Data Lake Gen 2.

 

 

 

Implementing Junk Dimensions in Microsoft Power BI Tabular Data Models: Part Two

maxresdefault1

Introduction

The blog post related to part one in this blog post series was about understanding what junk dimensions are and how can junk dimensions be implemented in the Power BI tabular data models with a database as a data source. This blog post will be part two and the final part in the series which will be about understanding the key difference between implementing junk dimensions in Power BI tabular data models with different data sources and how can junk dimensions be implemented in the Power BI tabular data models with a flat file as a data source.

Please refer the part one in this blog post series if not referred for a better understanding about the concept of junk dimensions by following the link below:

https://thimantha.wordpress.com/2019/09/13/implementing-junk-dimensions-in-microsoft-power-bi-tabular-data-models-part-one/

Key Difference in Implementing Junk Dimensions with Data Sources

The key difference between implementing junk dimensions with a database as a data source and a flat file as a data source is the technique in which the mappable unique key is created to map the junk dimension table with the fact table. As mentioned in part one in this blog post series, when the data source is a database, the unique key is created in the SQL script. When the data source is a flat file, the unique key should be created with DAX. Moreover, when the data source is a database, there is a possibility to use DAX to create a unique key rather than the SQL script. This is not the most optimized practice. As a result, this technique will only be used when the data source is a flat file.

Specifically, there are two functions to create a unique key with DAX which are the CONCATENATE function and the SWITCH function. These will be discussed further during implementation.

Stepwise Implementation

For demonstration purposes I will be using an extracted CSV file called ‘ProductionTransaction’ from the view  ‘vProductionTransaction’  used in part one in this blog post series.

Below is the link to download the CSV file :

https://drive.google.com/open?id=1y8ZvRIKFfowHjbxi2Jr69-Iob9OxZKih

Stepwise Implementation : Junk Dimension

Creating Junk Dimensions

Click on the ‘Get Data’ option in the ‘Home’ tab and select ‘Text/CSV’ to import a flat file into Microsoft Power BI.

implementing_junk_dimensions_part_two_1

The file path of the CSV file should be browsed for Power BI to build a connection with the file and click on ‘Open’ to proceed further.

implementing_junk_dimensions_part_two_2

A preview version of the flat file is shown after Microsoft Power BI builds a connection with the CSV file. In the bottom of the window, there are 3 options provided which are load, transform data and cancel. Load option is used to load the data directly into Power BI for reporting purposes without any cleansing or transformations. Transform Data option is used for data cleansing and transformations before loading the data for reporting purposes. Cancel option is used to cancel the process of importing the flat file. At this stage the option ‘Transform Data’ option will be used since the data should be cleansed and transformed.

implementing_junk_dimensions_part_two_3

When the Transform Data option is selected the user will be directed into the Power Query window. The dimension will be created in here.

implementing_junk_dimensions_part_two_4

Rename the dataset as Junk in the Properties section of the Query Settings.

implementing_junk_dimensions_part_two_5

The columns related to the junk dimension are ‘MakeFlag’ and ‘FinishedGoodsFlag’. Rest of the columns are understood to be unrelated. Select the related columns and right click to select the Remove Other Columns option.

implementing_junk_dimensions_part_two_6

When the Remove other Columns option is clicked, the columns unrelated to the junk dimension will be removed.

Select the two columns and right click to select the Remove Duplicates option.

implementing_junk_dimensions_part_two_7

The duplicates will be removed from the junk dimension and only the records with a unique combination of the columns ‘MakeFlag’ and ‘FinishedGoodsFlag’ will remain.

implementing_junk_dimensions_part_two_8

Click on the ‘Close & Apply’ option in the top left-hand corner of the window. This will load the data model into the standard Power BI views which are report view, tabular view and model view.

implementing_junk_dimensions_part_two_9

Creating Unique Keys (Primary Keys)

The unique key will be a calculated column that will be created in the junk dimension with the use of DAX.

Click on the ‘New Column’ option in the modeling tab to create the new column.

implementing_junk_dimensions_part_two_10

A formula bar will be prompted demonstrating the default column name as Column.

implementing_junk_dimensions_part_two_11

There are two ways in which the unique key can be created with DAX.

Firstly, the unique key will be created with the CONCATENATE function. At this stage the value in the ‘MakeFlag’ and the ‘FinishedGoodsFlag’ are concatenated together to form a unique key. The columns ‘MakeFlag’ and the ‘FinishedGoodsFlag’ should be converted to the data type string (text) to proceed.

Below is the DAX used in the concatenation:

JunkKey = CONCATENATE(Junk[MakeFlag],Junk[FinishedGoodsFlag])

 

implementing_junk_dimensions_part_two_12

Secondly, the unique key will be created with a SWITCH statement. At this stage the unique keys can be specified for the specific combinations in ‘MakeFlag’ and ‘FinishedGoodsFlag’. The columns ‘MakeFlag’ and the ‘FinishedGoodsFlag’ should be converted to the data type integer (whole number) to proceed.

Below is the DAX used in the SWITCH statement:

JunkKey = SWITCH(TRUE(),
Junk[MakeFlag]=1 && Junk[FinishedGoodsFlag]=1,1,
Junk[MakeFlag]=1 && Junk[FinishedGoodsFlag]=0,2,
Junk[MakeFlag]=0 && Junk[FinishedGoodsFlag]=1,3,
Junk[MakeFlag]=0 && Junk[FinishedGoodsFlag]=0,4)

implementing_junk_dimensions_part_two_18

With the use of any of the two DAX functions, the unique key can be created, and the junk dimension is ready to be used in the model.

Stepwise Implementation : Fact Table

Creating Fact Tables

Click on the ‘Get Data’ option in the ‘Home’ tab and select ‘Text/CSV’ to import a flat file into Microsoft Power BI.

implementing_junk_dimensions_part_two_1

The file path of the CSV file should be browsed for Power BI to build a connection with the file and click on ‘Open’ to proceed further.

implementing_junk_dimensions_part_two_2

A preview version of the flat file is shown after Microsoft Power BI builds a connection with the CSV file.

implementing_junk_dimensions_part_two_3

When the Transform Data option is selected the user will be directed into the Power Query window. The fact table will be cleansed in here.

implementing_junk_dimensions_part_two_23

The column ‘TransactionID’ is understood to be unrelated to the fact table. Select the column and right click to select the Remove option.

implementing_junk_dimensions_part_two_22

When the Remove option is clicked, the column ‘TransactionID’ will be removed from the fact table.

Click on the ‘Close & Apply’ option in the top left-hand corner of the window. This will load the data model into the standard Power BI views which are report view, tabular view and model view.

implementing_junk_dimensions_part_two_9

The columns ‘MakeFlag’ and ‘FinishedGoodsFlag’ are understood to be unwanted columns in the fact table. These columns will only be required to create the mappable key to map the fact table with the unique key in the junk dimension. Moreover, the column ‘TransactionDate’ is considered as a unwanted column as well. This column will only be required to map the fact table and the calendar table.

Right click on these unwanted columns and select the ‘Hide in Report View’ option to hide the columns.

implementing_junk_dimensions_part_two_24

Creating Mappable Keys (Foreign Keys)

The mappable key will be a calculated column that will be created in the fact table to map the fact table with the junk dimension. This will be created in the same way that the unique key was created in the junk dimension.

Click on the ‘New Column’ option in the modeling tab to create the new column.

implementing_junk_dimensions_part_two_25

A formula bar will be prompted demonstrating the default column name as Column.

implementing_junk_dimensions_part_two_26

Firstly, the mappable key will be created with the CONCATENATE function. At this stage the value in the ‘MakeFlag’ and the ‘FinishedGoodsFlag’ are concatenated together to form a unique key.

Below is the DAX used in the concatenation:

JunkKey = CONCATENATE(ProductionTransaction[MakeFlag],ProductionTransaction[FinishedGoodsFlag])

 

implementing_junk_dimensions_part_two_27

Secondly, the mappable key will be created with a SWITCH statement. At this stage the unique keys can be specified for the specific combinations in ‘MakeFlag’ and ‘FinishedGoodsFlag’.

Below is the DAX used in the SWITCH statement:

implementing_junk_dimensions_part_two_28

Finalised Tabular Data Model

Demonstrated below is the result after the fact table ‘ProductionTransaction’ and the dimensions ‘Junk’ and ‘Calendar’ are created. The fact table ‘ProductionTransaction’ is mapping appropriately with the junk dimension ‘Junk’ with a one-to-many relationship.

implementing_junk_dimensions_part_two_29

In the meantime, if there any data modeling issues related to creating the fact table and the calendar dimension, please refer the step wise data modeling implementation in my earlier blog post ‘Converting Flat Files to Microsoft Power BI Tabular Data Models’.

Below is the link:

https://thimantha.wordpress.com/2019/09/03/converting-flat-files-to-microsoft-power-bi-tabular-data-models/

Conclusion

There is a segment of business users who use flat files for their day to day data analysis and analytics purposes. The process of creating a junk dimension in this blog post will be useful for these users. This concludes the blog post series in implementing junk dimensions in Microsoft Power BI tabular models. During the series of two blog posts, the basic concept of junk dimensions were covered along with implementation of junk dimensions to different data sources.

Implementing Junk Dimensions in Microsoft Power BI Tabular Data Models: Part One

maxresdefault1

Introduction

Junk dimensions are implemented rarely with Power BI tabular data models and acts as an untouched area for most of the Power BI users. This will be a series of blog posts with two posts as part one and part two. Part one will be about understanding what these junk dimensions are and how can these junk dimensions be implemented in Power BI tabular data models with a database as a data source which will be covered in this blog post.

Understanding Junk Dimensions

A junk dimension is a dimension that is dedicated to store random indicators/flags which are low in cardinality. There are several other ways to handle such low cardinality values in a data model. One common way is to have a dedicated dimension for each indicator/flag. This will consume memory unnecessarily and will reflect in the performance of the data model negatively. Another common way is to maintain the indicators/flags in the fact table itself. Such an approach does not align with the best practices of data modeling. Only numerical values that are measurements and metrics can be maintained in fact tables according to data modeling best practices. With all these considerations implementing junk dimensions will be the most appropriate solution for such a requirement.

Ralph Kimball and Margy Ross describes junk dimensions as a ‘junk drawer in your kitchen’ in their book ‘The Data Warehouse Toolkit’. Specifically, the junk drawer is a dumping ground for several different household items such as rubber bands, paper clips, batteries and tape. Under these circumstances maintaining a dedicated drawer for each of these household items are a waste in space since none of the items consist a significant quantity to occupy a drawer each and none of the items are required frequently. It is understood that this is description is the most appropriate approach to understand the use of junk dimensions.

Step Wise Implementation

For demonstration purposes I will be using a view called ‘vProductionTransaction’ created with the tables Transaction History and Product related to the schema Production in Adventure Works 2017.

Below is the SQL script used to create the view:

CREATE VIEW
[Production].[vProductionTransaction]
AS

SELECT
TH.TransactionID,
TH.TransactionDate,
P.MakeFlag,
P.FinishedGoodsFlag,
SUM(Quantity) Quantity
FROM Production.TransactionHistory TH
INNER JOIN
Production.Product P
ON TH.ProductID=P.ProductID
GROUP BY
TH.TransactionID,
TH.TransactionDate,
P.MakeFlag,
P.FinishedGoodsFlag

When the view is created query the view to inspect the data and get a better understanding to proceed with the development.

Implementing_Junk_Dimensions_4

At this stage there are two dimensions identified along with the fact table. The junk dimension will be the dimension with the columns ‘MakeFlag’ and the ‘FinishedGoodsFlag’ which are declared as boolean data types. Calendar dimension is also an identified dimension in relation to the ‘TransactionDate’ column. Furthermore, the ‘Quantity’ column will be maintained in the fact table with the mappings for the related dimensions.

Firstly, the junk dimension will be created and loaded into Microsoft Power BI with the SQL script below. In this script, a CASE statement is used to create a unique key column called ‘JunkID’ to map the rows in the fact table with the junk dimension table.

SELECT DISTINCT
CASE
WHEN MakeFlag=’1′ AND FinishedGoodsFlag=’1′ THEN 1
WHEN MakeFlag=’0′ AND FinishedGoodsFlag=’0′ THEN 2
WHEN MakeFlag=’1′ AND FinishedGoodsFlag=’0′ THEN 3
WHEN MakeFlag=’0′ AND FinishedGoodsFlag=’1′ THEN 4
END AS JunkID,
MakeFlag,
FinishedGoodsFlag
FROM
Production.vProductionTransaction

Implementing_Junk_Dimensions_5

Click on the ‘Get Data’ option in the ‘Home’ tab and select ‘SQL Server’ to import a flat file into Microsoft Power BI.

Implementing_Junk_Dimensions_1

The user will be directed into the SQL Server data source connection configuration window. Mention the server name, database name and the data connectivity mode. Since the junk dimension is not created and required to be created in Microsoft Power BI, click on the advanced options and paste the SQL script suggested above.

Implementing_Junk_Dimensions_2

Click on ‘OK’ and the user will be directed into a window demonstrating the preview version of the junk dimension.

Implementing_Junk_Dimensions_3

Click on the ‘Transform Data’ option to modify the junk dimension further in the Power Query Editor and assure that the dimension is ready.

Implementing_Junk_Dimensions_7

Rename the junk dimension as ‘Junk’ in the Properties section of the Query Settings.

Implementing_Junk_Dimensions_8

Rename the columns in any convenient/user friendly name to appear in the data model.

Implementing_Junk_Dimensions_9

The junk dimension is ready now. Click on the ‘Close & Apply’ option in the top left-hand corner of the window to load the dimension into the data model.

Implementing_Junk_Dimensions_10

The fact table will be created in Microsoft Power BI with the same approach. Below is the script that will be used to create the fact table. As with the junk dimension, a CASE statement is used to create a mapping column called ‘JunkID’ to map the rows in the fact table with the junk dimension table.

SELECT
TransactionID,
TransactionDate,
CASE
WHEN MakeFlag=’1′ AND FinishedGoodsFlag=’1′ THEN 1
WHEN MakeFlag=’0′ AND FinishedGoodsFlag=’0′ THEN 2
WHEN MakeFlag=’1′ AND FinishedGoodsFlag=’0′ THEN 3
WHEN MakeFlag=’0′ AND FinishedGoodsFlag=’1′ THEN 4
END AS JunkID,
Quantity
FROM
Production.vProductionTransaction

Implementing_Junk_Dimensions_11

The other steps to follow are same as the junk dimension. As a result, those steps will not be replicated for the fact table.

Demonstrated below is the result after the fact table ‘Production’ and the dimensions ‘Junk’ and ‘Calendar’ are created. The fact table ‘Production’ is mapping appropriately with the junk dimension ‘Junk’ with a one-to-many relationship.

Implementing_Junk_Dimensions_12

In the meantime, if there any data modeling issues related to creating the fact table and the calendar dimension, please refer the step wise data modeling implementation in my earlier blog post ‘Converting Flat Files to Microsoft Power BI Tabular Data Models’.

Below is the link:

https://thimantha.wordpress.com/2019/09/03/converting-flat-files-to-microsoft-power-bi-tabular-data-models/

Conclusion

It is important to understand the concept behind junk dimensions to implement when required. Furthermore, the database management system that is used as a data source for this implementation is Microsoft SQL Server. Similarly, other database management systems such as Oracle and MySQL can be used as data sources too. In the second and the final part of this series of blog posts, implementing a junk dimension in a Power BI tabular data model with a flat file will be covered.

 

Converting Flat Files to Microsoft Power BI Tabular Data Models

maxresdefault1

Introduction

A segment of Power BI users consider flat files as data sources when analyzing and visualizing data in Microsoft Power BI. Majority of such users do not convert these flat files into data models. In fact, most of them do not understand how useful it is to have a data model. This blog post will focus on converting flat files into tabular data models in Microsoft Power BI.

What is a Tabular Data Model?

Tabular data models are developed within a fast in-memory engine called VertiPaq based on the concepts of tables and relationships. VertiPaq consists of state-of-the-art compression algorithms. As a result tabular data models are faster in performance.

What is the Importance of a Data Model?

Data models are specifically optimized for data analysis and analytics. There will be less room for performance issues in Data Analysis and Data Analytics when a data model is used. On the other hand, without using data models, users can use transactional(relational) databases or tables directly for data analysis and analytics. Under such circumstances, the performance will be an issue as the amount of data increase in the sources.

What is a Star Schema?

Star Schema is the most commonly used data model schema. This includes fact table/tables connected around dimension tables which forms a shape of a star. It is understood that Star schemas are not fully normalized. Specifically, star schemas are optimized for processing large datasets.

Below is a simple example of a Star schema implemented for a Sales module data model:

star-schema
Source : https://techdifferences.com/difference-between-star-and-snowflake-schema.html

 

There are four-dimension tables in the data model which are Time, Location, Item and Branch along with the fact table Sales. However, this blog post will only cover the dimensions Geography (Location) and Calendar (Time) with the fact table Sales for demonstration purposes.

Importing the Flat File

The first step of converting a flat file into a tabular data model is to import a flat into Microsoft Power BI. This imported flat file will be developed into a data model.

Click on the ‘Get Data’ option in the ‘Home’ tab and select ‘Text/CSV’ to import a flat file into Microsoft Power BI.

convert_flat_files_to_data_model_1

The file path of the CSV file should be browsed for Power BI to build a connection with the file and click on ‘Open’ to proceed further.

convert_flat_files_to_data_model_2

A preview version of the flat file is shown after Microsoft Power BI builds a connection with the CSV file. In the bottom of the window, there are 3 options provided which are load, transform data and cancel. Load option is used to load the data directly into Power BI for reporting purposes without any cleansing or transformations. Transform Data option is used for data cleansing and transformations before loading the data for reporting purposes. Cancel option is used to cancel the process of importing the flat file. At this stage the option ‘Edit’ option will be used since the data should be cleansed and transformed which results in a flat file converted into a tabular data model.

convert_flat_files_to_data_model_3

When the Transform Data option is selected the user will be directed into the ‘Edit Queries’ window. This is where the data model will be developed.

convert_flat_files_to_data_model_4

What is a Dimension Table?

A dimension table is an aspect in which the fact table can be described. This consists of the textual data to describe the fact table. The primary key in the dimension table will be a foreign key in the fact table to build a one to many relationship between the two tables.

Developing Dimension Tables

Geography and Calendar dimensions are two most commonly used dimensions in almost any data model. For demonstration purposes these two dimensions will be developed.

  • Geography Dimension

Right click on the CSV file name in the Queries section and click on the duplicate option to create a duplicate version of the dataset for the Geography dimension.

convert_flat_files_to_data_model_5

Rename the dataset as Geography in the Properties section of the Query Settings.

convert_flat_files_to_data_model_7

The columns related to the dimension Geography are Country, City, State, Post Code and Region. Rest of the columns are understood to be unrelated.

convert_flat_files_to_data_model_8

Select the related columns and right click to select the Remove Other Columns option.

convert_flat_files_to_data_model_9

When the Remove other Columns option is clicked, the columns unrelated to the Geography dimension will be removed.

convert_flat_files_to_data_model_10

Select all the columns and right click to select the Remove Duplicates option.

convert_flat_files_to_data_model_11

Drag and arrange the columns in an order that is readable enough specifically in the hierarchical order.

convert_flat_files_to_data_model_13

The Geography dimension is ready to be deployed into the data model now. Furthermore, it is the same procedure that should be carried out when developing any other dimensions. The development procedure is only different in the Calendar dimension.

  • Calendar Dimension

Calendar dimension is one of the most common dimensions used in any data model. In other words, any data model where time intelligence is implemented consists of a Calendar dimension.

Calendar dimensions are usually implemented as DAX tables. The users should exit out of Edit Query mode by following the below process to accomplish this task.

Click on the ‘Close & Apply’ option in the top left-hand corner of the window. This will load the data model into the standard Power Bi views which are report view, tabular view and model view.

convert_flat_files_to_data_model_14

Select the New Table option in the Modeling tab to create a new DAX table.

convert_flat_files_to_data_model_15

A formula bar will be prompted demonstrating the default table name as Table. The default column name is demonstrated in the tabular view as Column as well.

convert_flat_files_to_data_model_16

At this stage there are two DAX functions that can be used. These are CALENDAR function or CALENDARAUTO function.

CALENDAR Function

CALENDAR function allows the user to maintain dates in the table based on a date range.

Below is the DAX implementation when the CALENDAR function is used:

Calendar=CALENDAR (DATE (2000,01,01), DATE (2020,12,31))

According to this, dates starting from 1st of January 2000 to 31st of December 2020 will be maintained in the table.

CALENDARAUTO Function

CALENDARAUTO function maintains dates with consideration to the data in the data model automatically. The fiscal year end month acts as an input parameter in the implementation.

Below is the DAX implementation when the CALENDARAUTO function is used:

Calendar = CALENDARAUTO (4)

According to this, the date range is automatically set with consideration to the dates in the data model. The fiscal year end month is assumed to be April.

The function that will be used in the data model will be the CALENDARAUTO function since unwanted dates will be not stored in the dimension as a result of implementing such a function.

convert_flat_files_to_data_model_17

The Calendar dimension is ready to be deployed into the data model now.

What is a Fact Table?

Fact table is the main table in the data model that consists of numerical values which are measurements and metrics. Usually fact tables only store numerical data and primary keys of dimension tables as foreign keys.,

Developing Fact Tables

Rename the fact table into a user-friendly name in the Properties section of Query Settings.

convert_flat_files_to_data_model_18

Select the unwanted columns and right click to remove unwanted columns in the fact table. The unwanted columns in the table are the Row ID and the Order ID. These two columns act as unique keys which slow down the performance in the data model. On the other hand, it is understood that Country, Region, State and City are unrelated since the columns already exist in the Geography dimension. Postal Code is needed to map the Geography dimension with the Fact table.

convert_flat_files_to_data_model_19

Click on the ‘Close & Apply’ option in the top left-hand corner of the window. The user will be directed into the standard Power BI views.

convert_flat_files_to_data_model_14

Refer the Sales fact table. Select the Order Date/Postal Code columns and right click to hide the columns in the report view since the columns are only required for mapping between the dimensions Calendar and Geography.

convert_flat_files_to_data_model_20

Relationships between Facts and Dimensions

Get into the model view in Power BI. At this stage the two dimension tables and the fact table which were developed are visible.

convert_flat_files_to_data_model_21

Map the Order Date column in the Sales fact table with the Date column in the Calendar dimension table and the Postal Code column in the Sales fact table with the Postal Code column in the Geography dimension table. This can be done by dragging the mappable column in the fact table into the mappable column in the dimension table. The relationship between the tables will be created automatically.

convert_flat_files_to_data_model_22

Developing Calculated Columns/Measures

  • Calculated Columns

Calculated columns are columns that hold row wise calculations. Since this is stored in memory, space is consumed. In other words, data model performance will be influenced with the use of calculated columns.

Currently the column ‘Sales’ consists the sales amount in USD. A separate column will be required showing the sales amount in GBP.

Refer the Sales fact table and select the new column option in the modeling tab to create a new calculated column.

convert_flat_files_to_data_model_23

A new column named ‘Column’ will appear in the table and the formula bar on top.

convert_flat_files_to_data_model_24

The below calculation is used to convert the Sales in USD to GBP with consideration to the current exchange rate. Column name is renamed as ‘SalesGBP’.

SalesGBP = Sales [Sales]*0.83

When the calculation is placed instead the of the column named ‘Column’ in the top formula bar, the ‘SalesGBP’ column will appear after clicking on the tick in the left-hand side.

convert_flat_files_to_data_model_25

  • Calculated Measures

Calculated measures hold aggregated calculations. This is not stored in memory. As a result, the performance of the data model is not influenced.

A measure will be created for the column ‘Profit’ to aggregate the profit figure for analytical purposes.

Refer the Sales fact table and select the new measure option in the modeling tab to create a new calculated measure.

convert_flat_files_to_data_model_26

A new measure named ‘Measure’ will appear in the table and the formula bar on top.

convert_flat_files_to_data_model_27

The below function is used to sum up all the profits into one figure. Column name is renamed as ‘Profit Amount’.

Profit Amount=SUM(Sales[Profit])

When the function is placed instead the of the measure named ‘Measure’ in the top formula bar, the ‘Profit Amount’ measure will appear in the fields section in the right-hand side after clicking on the tick in the left-hand side.

convert_flat_files_to_data_model_28

Conclusion

Data modeling is an important perspective that should be covered when working with Microsoft Power BI. This blog post was mainly written for Power BI fresheners to follow up in data modeling while fulfilling reporting requirements. I would say that any Power BI report/dashboard is incomplete without a proper data model.

YouTube Video Publishing Analysis with Microsoft Power BI

youtube_video_publishing_analysis_4
Introduction

Millions of videos are published to YouTube everyday by YouTubers. Some of these videos turn out to be successful with views and likes while some are failures. This analysis provides the ability for YouTubers to decide when should their videos be published.

Data Source

Below is the link of the data source used:

https://www.kaggle.com/datasnaek/youtube-new

Data Model

youtube_video_publishing_analysis

The fact table of the data model is Statistics while the dimension tables are Calendar, Time, Category, Country and Video. There is much more potential left to uncover in this tabular data model. In fact, I will be working on providing with more analysis with this data model in the future.

Measures

DAX measures were created in the data model to cater various requirements. Currently the measures Videos Total, Video Views, Video Likes and Video Dislikes. Other measures will be useful for analysis in the future.

Visualisations

youtube_video_publishing_analysis_4

Line charts were used for visualising the published video count, video views, video likes and video dislikes against the published time. On the other hand, YouTubers can filter the report by a targeted country and the relevant video category as well. The YouTubers are able to compare between the analysis in the Line charts while filtering by a particular targeted country and a country to decide when should their videos be published.

Custom Tooltips

Custom Tooltips were included in every line chart in order to make the report look much more prominent and persuasive.

Conclusion

With the above analysis YouTubers can decide when should their videos should be published with consideration to the targeted countries US and Canada. Furthermore, there is much more potential in the data model that was developed and there will be more analysis carried out with this.

Improvements Required for Shape Maps in Microsoft Power BI

maxresdefault1

Introduction

Shapes maps are a type of map visualisation in Microsoft Power BI which is in preview mode for a while now. I personally believe that shape maps can be extremely useful if certain improvements are addressed. This blog post will take you through the improvements that should be addressed to bring shape maps out of the preview mode.

What is a Shape Map?

Before reaching any further, it is necessary to have a clear understanding of shape maps. A shape map is a map visualisation which is useful to compare regions in a particular area against a  metric/measure.

Shown below is a basic demonstration of a shape map :improvements_shape_maps_1

At this stage the death rates in United States are visualised based on the states. The states visualised in red are the states with high death rates and the states visualised in green are the states with low death rates. Furthermore, the states visualised in yellow are the states with average death rates.

Why Shape Maps?

Shape maps can be considered as a useful map visualisation with the feature to have any type of map by uploading a TopoJSON file. Two cases in which shape maps can be used are passenger seating arrangements in an aircraft and floor plans in an office environment. Many such cases can be approached this way with the use of shape maps. The other map visualisations does not provide this functionality.

What are the Improvements Required for Shape Maps?

Report Efficiency

Shape maps severely influence the efficiency in Power BI reports. The entire report slows down drastically when a shape map is placed. It is definite that there is no future for shape maps if this issue is not addressed soon. There is no Power BI user who would prefer interacting with slow reports. Under such circumstances if map visualisations are required, users will consider other types of map visualisations available.

Map Restrictions

Currently, shape maps are available for a restricted number of countries/regions such as Australia, Austria, Brazil, Canada, France, Germany, Ireland, Italy, Mexico, Netherlands, UK and USA. This scope should be expanded for shape maps to be used in a variety of regions and situations. However, shape maps provide the option of uploading maps as TopoJSON files for any unavailable countries/regions. At times there is a lot of work to be done beyond Power BI to generate or find these TopoJSON files for required countries/regions. Life will be much easier if the number of countries/regions available in shape maps are expanded further. TopoJSON files can be uploaded for custom maps beyond countries/regions.

Formatting Features

Formatting features available in shape maps are quite poor currently compared to other map visualisations such as filled maps. Two main formatting options that should be available are selecting map styles and labeling regions. Shape maps should provide the formatting options available in other map visualisations in order to reach the standards.

Conclusion

There is a possibility for shape maps to be one of the best Power BI map visualisations available if the above improvements are addressed. On the other it is quite disappointing to see shape maps in preview mode still while Microsoft Power BI is improving frequently as a BI tool.

Machine Learning for Student Scores

machine-learning

Introduction

Previously, I took you through the process of implementing a Simple Linear Regression model which was my first machine learning pet project. Recently, I implemented a Simple Linear Regression model for a student scores dataset that I found online as well. In this blog post I will take you through this process which is my second Machine Learning pet project.

About the Dataset

There are only two features in the dataset which are Study Hours and the Student Scores. At this stage the Student Scores are directly dependent on the number of Study Hours. In other words, the more time students spend on studying, the better their score will be.

Shown below is a scatter plot that clearly demonstrates this dependency:

student_score_7

The dependency of the two features can be clearly identified when the correlation coefficient is calculated as well. At this stage if the correlation between the two features is greater than 0, there is a positive relationship between Study Hours and Student Scores. Moreover, if the correlation between the two variables is less than 0, there is a negative relationship between Study Hours and Student Scores.

student_score_6

According to the identified criteria in data, the student scores (y) will be predicted based on study hours (x). The predicted student scores will be plotted with a trendline in the scatter plot to assure that the prediction is accurate enough.

Initial Steps

Necessary Python packages were imported:

student_score_1

Employee dataset was imported into a data frame:

student_score_2

Inspect the dataset to understand the features with describe ():

student_score_4

Inspect the data types of features with dtypes:

student_score_5

Declare Experience as the variable x and Salary as the variable y:

student_score_8

Splitting the Dataset for Training and Testing

The dataset needs to be split into two as training and testing datasets. At this stage the training dataset will be used to train the Machine Learning model while the testing dataset will be used to evaluate the results after training the Machine Learning model. Moreover 75% of the dataset was split for training and 25% was split for testing.

student_score_13

Reshaping X

Both the x variables split into training and testing are shaped to fit as a single independent x variable rather than multiple independent x variables. This is because the algorithm used is Simple Linear regression.

student_score_14

Declaring and Fitting the Linear Regression Model

The Linear Regression Model is declared into the variable ‘LM’ and the independent variable x_train with the dependent variable y_train were fitted into the variable ‘LM’. This is the stage where the Machine Learning model is trained based on the Linear Regression algorithm.

student_score_9

Predicting the Student Scores (y)

With the trained Machine Learning model, the Student Scores is predicted against the independent variable x_test in the testing dataset.

student_score_10

Plotting the Prediction with the Test Observations

The prediction is plotted with a trend line in the scatter plot visualisation with the test observations to assure that the prediction is accurate enough. As shown below the prediction is quite accurate.

student_score_11

Assuring the Accuracy of the Model

With the metrics below it is clear that the machine learning model developed consists of an accuracy of 94%. This means that there is probability of 94% for the model to make correct predictions.

student_score_12

Conclusion

Demonstrated above is the second Machine Learning model I developed. I’m looking forward to explore Multivariate Linear Regression models in the future where predictions can be based on multiple independent variables rather than a single dependent variable.

 

 

Machine Learning for Employee Salaries

machine-learning

Introduction

Simple Linear Regression is one of the most common algorithms that is used by beginners to get started with Machine Learning. This is because the algorithm is the simplest algorithm to be understood by any individual. Specifically, Simple Linear Regression is where the dependent variable y is predicted based on a single independent variable x. In this blog post I will take you through the process of implementing a Simple Linear Regression model with Python for the Employee Salaries dataset which is my first Machine Learning pet project.

About the Dataset

The dataset is a basic dataset for a beginner to start up with. There are only two features in the dataset which are Employee Experience and the Employee Salary. At this stage the Employee Salary is directly dependent on the Work Experience. In other words, the more experience that an employee has, the higher his or her salary will be.

Shown below is a scatter plot that clearly demonstrates this dependency :

employee_salary

The dependency of the two features can be clearly identified when the correlation coefficient is calculated as well. At this stage if the correlation between the two features is greater than 0, there is a positive relationship between Experience and Salary. Moreover, if the correlation between the two variables is less than 0, there is a negative relationship between Experience and Salary.

employee_salary_2

According to the identified criteria in data, the employee salaries (y) will be predicted based on employee experience (x). The predicted salary will be plotted with a trendline in the scatter plot to assure that the prediction is accurate enough.

Initial Steps

Necessary Python packages were imported:

employee_salary_3

Employee dataset was imported into a data frame:

employee_salary_4

Inspect the dataset to understand the features with describe ():

employee_salary_5

Inspect the data types of features with dtypes:

employee_salary_6

Declare Experience as the variable x and Salary as the variable y:

employee_salary_7

Splitting the Dataset for Training and Testing

The dataset needs to be split into two as training and testing datasets. At this stage the training dataset will be used to train the Machine Learning model while the testing dataset will be used to evaluate the results after training the Machine Learning model. Moreover 75% of the dataset was split for training and 25% was split for testing.

employee_salary_8

Reshaping X

Both the x variables split into training and testing are shaped to fit as a single independent x variable rather than multiple independent x variables. This is because the algorithm used is Simple Linear regression.

employee_salary_9

Declaring and Fitting the Linear Regression Model

The Linear Regression Model is declared into the variable ‘lm’ and the independent variable x_train with the dependent variable y_train were fitted into the variable ‘lm’. This is the stage where the Machine Learning model is trained based on the Linear Regression algorithm.

employee_salary_10

Predicting the Employee Salary (y)

With the trained Machine Learning model, the Salary is predicted against the independent variable x_test in the testing dataset and declared to the variable ‘prediction’.

employee_salary_11

Plotting the Prediction with the Test Observations

The prediction is plotted with a trend line in the scatter plot visualisation with the test observations to assure that the prediction is accurate enough. As shown below the prediction is quite accurate.

employee_salary_12

Conclusion

Demonstrated above is the first Machine Learning model I developed. I’m still covering up the basics in Machine Learning with Python. There is a lot more for me to learn in order to build much better Machine Learning models.

Getting Started with Machine Learning

machine-learning

Machine Learning is where you use a programming language to train a model based on an algorithm which will be used to perform a set of actions. It plays a crucial role in Artificial Intelligence. I was really impressed when I started understanding the power within Machine Learning and wanted to immediately get started with it. This blog post will take you through my experiences when getting started with Machine Learning.

When researching about Machine Learning, I realised that the programming languages that can be used for Machine Learning were Python and R. For some reason I decided to start up with R. I was following several courses online to get myself familiarised with R. As I was learning R, I had that extreme intention to develop a Machine Learning model as soon as possible. So, I tried fast tracking the R courses I followed to reach that stage where I will be developing a Machine Learning model. This got me into a lot of struggles with R since I wasn’t thorough with the basics. I had to revert and start everything from the beginning. With thorough focus, I was able to reach the stage where I was able to carry out data analysis with R successfully in R Studio. However, I realised that it wasn’t an acceptable achievement since that could be done with Microsoft Power BI as well. So, I kept on working hard in reaching the next stage which was to develop a Machine Learning model. Once again, there were several struggles in getting things right. I understood this was because there were less content online for Machine Learning with R compared to Python. After handling these struggles for a while, I took a little break to think about this a bit more. I decided to get started in Python from scratch with Jupyter Notebook since there is much more comforting online content available for Python. I only followed just one proper EDX course so far. Its working just fine for me.

On the other hand, statistics is currently in my radar too. It is compulsory to have an appropriate knowledge in statistics to communicate with the actual power in Machine Learning. I only have a very basic knowledge in statistics right now. Improving my knowledge in statistics is important to proceed further with Machine Learning.

I’m still in my early days with Machine Learning. There is a lot more for me to learn ahead. I will be working on Machine Learning pet projects to challenge my self further and posting them right here in my blog. The message that I wanted to pass from this blog post is that if you are also interested in getting started with Machine Learning, do not decide which programming language you will use without trying them out. Try Python and R both to figure out what will suite you the most.