News
Datatype Conversion in Power Query Affects Data Modeling in Power BI
[ad_1]
In my consulting experience working with customers using Power BI, many challenges that Power BI developers face are due to negligence to data types. Here are some common challenges that are the direct or indirect results of inappropriate data types and data type conversion:
- Getting incorrect results while all calculations in your data model are correct.
- Poor performing data model.
- Bloated model size.
- Difficulties in configuring user-defined aggregations (agg awareness).
- Difficulties in setting up incremental data refresh.
- Getting blank visuals after the first data refresh in Power BI service.
In this blogpost, I explain the common pitfalls to prevent future challenges that can be time-consuming to identify and fix.
Background
Before we dive into the topic of this blog post, I would like to start with a bit of background. We all know that Power BI is not only a reporting tool. It is indeed a data platform supporting various aspects of business intelligence, data engineering, and data science. There are two languages we must learn to be able to work with Power BI: Power Query (M) and DAX. The purpose of the two languages is quite different. We use Power Query for data transformation and data preparation, while DAX is used for data analysis in the Tabular data model. Here is the point, the two languages in Power BI have different data types.
The most common Power BI development scenarios start with connecting to the data source(s). Power BI supports hundreds of data sources. Most data source connections happen in Power Query (the data preparation layer in a Power BI solution) unless we connect live to a semantic layer such as an SSAS instance or a Power BI dataset. Many supported data sources have their own data types, and some donāt. For instance, SQL Server has its own data types, but CSV doesnāt. When the data source has data types, the mashup engine tries to identify data types to the closest data type available in Power Query. Even though the source system has data types, the data types might not be compatible with Power Query data types. For the data sources that do not support data types, the matchup engine tries to detect the data types based on the sample data loaded into the data preview pane in the Power Query Editor window. But, there is no guarantee that the detected data types are correct. So, it is best practice to validate the detected data types anyway.
Power BI uses the Tabular model data types when it loads the data into the data model. The data types in the data model may or may not be compatible with the data types defined in Power Query. For instance, Power Query has a Binary data type, but the Tabular model does not.
The following table shows Power Queryās datatypes, their representations in the Power Query Editorās UI, their mapping data types in the data model (DAX), and the internal data types in the xVelocity (Tabular model) engine:

As the above table shows, in Power Queryās UI, Whole Number, Decimal, Fixed Decimal and Percentage are all in type number in the Power Query engine. The type names in the Power BI UI also differ from their equivalents in the xVelocity engine. Let us dig deeper.
Data Types in Power Query
As mentioned earlier, in Power Query, we have only one numeric datatype: number while in the Power Query Editorās UI, in theĀ TransformĀ tab, there is aĀ Data TypeĀ drop-down button showing four numeric datatypes, as the following image shows:

In Power Query formula language, we specify a numeric data type asĀ type numberĀ orĀ Number.Type. Let us look at an example to see what this means.
The following expression creates a table with different values:
#table({"Value"}
, {
{100}
, {65565}
, {-100000}
, {-999.9999}
, {0.001}
, {10000000.0000001}
, {999999999999999999.999999999999999999}
, {#datetimezone(2023,1,1,11,45,54,+12,0)}
, {#datetime(2023,1,1,11,45,54)}
, {#date(2023,1,1)}
, {#time(11,45,54)}
, {true}
, {#duration(11,45,54,22)}
, {"This is a text"}
})
The results are shown in the following image:

Now we add a new column that shows the data type of the values. To do so, use theĀ Value.Type([Value])Ā function returns the type of each value of theĀ ValueĀ column. The results are shown in the following image:

To see the actual type, weĀ have to click on each cell (not the values) of theĀ Value TypeĀ column, as shown in the following image:

With this method, we have to click each cell in to see the data types of the values that is not ideal. But there is currently no function available in Power Query to convert a Type value to Text. So, to show each typeās value as text in a table, we use a simple trick. There is a function in Power Query returning the tableās metadata: Table.Schema(table as table). The function results in a table revealing useful information about the table used in the function, includingĀ column Name,Ā TypeName,Ā Kind, and so on. We want to showĀ TypeNameĀ of the Value Type column. So, weĀ only need to turn each value into a table using theĀ Table.FromValue(value as any)Ā function. We then get the values of theĀ KindĀ column from the output of theĀ Table.Schema()Ā function.
To do so, we add a new column to get textual values from theĀ Kind column. We named the new columnĀ Datatypes. The following expression caters to that:
Table.Schema(
Table.FromValue([Value])
)[Kind]{0}
The following image shows the results:

As the results show, all numeric values are of typeĀ number and the way they are represented in the Power Query Editorās UI does not affect how the Power Query engine treats those types. The data type representations in the Power Query UI are somehow aligned with the type facets in Power Query. A facet is used to add details to a type kind. For instance, we can use facets to a text type if we want to have a text type that does not accept null. We can define the valueās types using type facets using Facet.Type syntax, such as using In64.Type for a 64-bit integer number or using Percentage.Type to show a number in percentage. However, to define the valueās type, we use the type typename syntax such as defining number using type number or a text using type text. The following table shows the Power Query types and the syntax to use to define them:

Unfortunately, the Power Query Language Specification documentation does not include facets and there are not many online resources or books that I can reference here other than Ben Gribaudoās blog who thoroughly explained facets in detail which I strongly recommend reading.
While Power Query engine treats the values based on their types not their facets, using facets is recommended as they affect the data when it is being loaded into the data model which raises a question: what happens after we load the data into the data model? which brings us to the next section of this blog post.
Data types in Power BI data model
Power BI uses theĀ xVelocityĀ in-memory data processing engine to process the data. TheĀ xVelocityĀ engine usesĀ columnstoreĀ indexing technology that compresses the data based on the cardinality of the column, which brings us to a critical point: although the Power Query engine treats all the numeric values as the typeĀ number, they get compressed differently depending on their column cardinality after loading the values in the Power BI model. Therefore, setting the correct typeĀ facetĀ for each column is important.
The numeric values are one of the most common datatypes used in Power BI. Here is another example showing the differences between the four numberĀ facets. Run the following expression in a new blank query in the Power Query Editor:
// Decimal Numbers with 6 Decimal Digits
let
Source = List.Generate(()=> 0.000001, each _
The above expressionsĀ create 10 million rows of decimal values betweenĀ 0Ā andĀ 10. The resulting table has four columns containing the same data with different facets. The first column,Ā Source, contains the values of typeĀ any, which translates to typeĀ text. The remaining three columns are duplicated from theĀ SourceĀ column with differentĀ type facets, as follows:
- Decimal
- Fixed decimal
- Percentage
The following screenshot shows the resulting sample data of our expression in the Power Query Editor:

Now clickĀ Close & ApplyĀ from theĀ HomeĀ tab of the Power Query Editor to import the data into the data model. At this point, we need to use a third-party community tool,Ā DAX Studio, which can be downloaded from here.
After downloading and installing, DAX Studio registers itself as an External Tool in the Power BI Desktop as the following image shows:

Click the DAX Studio from the External Tools tab which automatically connects it to the current Power BI Desktop model, and follow these steps:
- Click theĀ AdvancedĀ tab
- Click theĀ View MetricsĀ button
- ClickĀ ColumnsĀ from theĀ VertiPaq Analyzer section
- Look at theĀ Cardinality,Ā Col Size, andĀ % TableĀ columns
The following image shows the preceding steps:

The results show that theĀ DecimalĀ column andĀ PercentageĀ consumed the most significant part of the tableās volume. Their cardinality is also much higher than theĀ FixedĀ DecimalĀ column. So here it is now more obvious that using theĀ Fixed DecimalĀ datatype (facet) for numeric values can help with data compression, reducing the data model size and increasing the performance. Therefore, it is wise to always use Fixed Decimal for decimal values. As theĀ Fixed DecimalĀ values translate to theĀ CurrencyĀ datatype in DAX, we must change the columnsā format if CurrencyĀ is unsuitable. As the name suggests, Fixed Decimal has fixed four decimal points. Therefore, if the original value has more decimal digits after conversion to theĀ Fixed Decimal, the digits after the fourth decimal point will be truncated.
That is why theĀ CardinalityĀ column in the VertiPaq Analyzer in DAX Studio shows much lower cardinality for theĀ FixedĀ DecimalĀ column (the column values only keep up to four decimal points, not more).
Download the sample file from here.
So, the message is here to always use the datatype that makes sense to the business and is efficient in the data model. Using the VertiPaq Analyzer in DAX Studio is good for understanding the various aspects of the data model, including the column datatypes. As a data modeler, it is essential to understand how the Power QueryĀ typesĀ andĀ facetsĀ translate to DAX datatypes. As we saw in this blog post, data type conversion can affect the data modelās compression rate and performance.
Related
Discover more from BI Insight
Subscribe to get the latest posts sent to your email.
[ad_2]
Source link
