Methods To Refresh a Semantic Model
DataHub Consulting, Experts in Analytics, Business Intelligence, and Compliance 310 310Read it in 19 minutes
window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-YFZ1F7T6M6');
window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-YFZ1F7T6M6');Read it in 19 minutes
Whilst supporting a developer to study for the Microsoft Fabric Analytics Engineer Associate exam (DP-600), they had a question on refreshing semantic models. So, we started to list them and talk about the pros and cons. This made me think about this article. There are so many ways to refresh a semantic model, more than many people think. This is why I personal prefer to use Power BI to Tableau or Qlik. Each method has their own benefits but when you get into the bigger models then some of these options come into their own. In this article we will look at each one in turn. Some are in public preview at the time of this article, but we will still discuss these options. Please be aware that some organizations have a policy that public preview features are not used on production systems. So if you plan to use a method in preview please check with the organization to see if this is acceptable. The reason for this is that public preview is a form of beta version and further changes can be implemented before general availability.
All Power BI reports / dashboards will have an associated Semantic model. A model can be the data source for one or many Power BI reports. So, to refresh the data you are initiating a refresh on the semantic model and not on the report itself. We are looking at the ways to refresh the semantic model within the service. This article does not include refreshing a local models in Power BI desktop.
This article is to look at the various option to refresh the semantic model. Please be aware that some of these options will require a Premium (P SKU) subscription, Premium Per User (PPU) licence, or a Fabric subscription (F SKU). Please be aware that the premium capacity (P SKU) will be retired soon. We discuss this in more detail further down in the article.
Let’s start to look at the options.
This is the most basic way to refresh a semantic model. In the service, navigate to the workspace where the semantic model is published too.
When you hover over the semantic model three icons will appear. There is a button with a circular arrow icon. This is the manual refresh button.
Please be aware that anyone with access to the workspace and has the build permissions could click this button. I’ve seen customers that have allowed the users access to the workspace and a user clicked the refresh button. One particular customer was using a premium capacity and had a big model that only refreshed a single partition on a daily basis. The partitions were based on a calendar quarters, and they only refreshed the most recent quarter. In this instance the user pressed the manual refresh button which initiated a full refresh of all the partitions.
This is one of the reasons I would recommend keeping users away from the workspace where possible and to allow them access to an APP for reports.
This is the most commonly used refresh method. With some people this is the only method they know and use. So, let’s have a look at scheduling the refresh of a semantic model.
There are multiple ways to get to the screen to configure a scheduled refresh.
Both will get you to the same place just different ways to do it.
Above is a typical example of a refresh schedule on a daily basis starting at 5:30am (UTC)
By clicking the “Add another time” will allow another refresh time to be included in the schedule. There is also the ability to add persons that should be notified if the refresh should fail.
Finally there is an apply button at the bottom to click. Some people set up or change a schedule and forget to scroll down and click apply. Without the apply all changes will not take affect.
Incremental refresh is where the semantic model only refreshes the data relating to specific dates. With incremental refresh the data within the semantic model is automatically partitioned by date, and the refresh will only affect the current partition. Partitions with historic dates will remain and the refresh will not involve these.
Incremental refresh relies on the source data being in a single source and required a compatible data source that is capable of Query Folding. This will typically be relational databases like SQL Server, Oracle, PostgreSQL, MySQL, and Teradata. Based on this requirement incremental refresh wouldn’t work if the source is excel, CSV, SharePoint list etc.
With incremental refresh it requires two parameters. These are “RangeStart” and “RangeEnd”. These parameter names are reserved and can’t be used for any other use apart from the incremental refresh.
Microsoft Learn: Incremental refresh for semantic models in Power BI – Power BI | Microsoft Learn
With semantic link there are the options to refresh the whole model, a table, or a partition. Semantic Link may be new to many people so let’s look at the basics. Semantic link is method of using a Fabric Notebook to create a Python script. Semantic Link has many uses but the one that we are looking at in this article allows us to refresh the semantic model. It uses the “SemPy” Python library for the necessary functions.
As a pre-requisite the semantic model needs to be in a workspace backed by an F capacity, P capacity, or PPU licence. The F capacity is a Microsoft Fabric capacity. The P capacity is the older Power BI Premium capacity, and finally a PPU is a individual licence called Premium Per User.
Note: These licences and subscriptions are valid at the time this article was published. But there are plans to retire the Premium (P SKU) soon. The information from Microsoft is: New customers will not be able to purchase Power BI Premium per capacity after July 1, 2024. Existing customers without an Enterprise Agreement (EA) will be able to renew their Power BI Premium capacity subscriptions until January 1, 2025.
For Spark 3.4 or higher then semantic link will be already installed. If you are running Spark 3.3 then install Semantic Link with the following command
%pip install -U semantic-link
How do we refresh a Semantic Model?
I’ll be using Python code for this example. Python is easy to follow and If you are going to be working with Fabric then get familiar with Python.
Step 1 – Identify the dataset and workspace that you want to refresh
import sempy.fabric as fabric
dataset = "YourDatasetName"
workspace = "YourWorkspaceName"
Step 2 – Specify the objects that want to be refreshed.
When I talk of objects then this would mean tables or partitions etc
objects_to_refresh = [
{"table": "Customers", "partition": "Customers-ROW"},
{"table": "Order_Details"}
]
Step 3 – Execute the refresh
You will notice that it uses the fabric.refresh_dataset function.
fabric.refresh_dataset(workspace=workspace, dataset=dataset, objects=objects_to_refresh)
Step 4 – List the refreshed objects
Confirm that the refresh was successful using the function: fabric.list_refresh_requests
refresh_requests = fabric.list_refresh_requests(dataset=dataset,
workspace=workspace)
print(refresh_requests)
Step 5 – Validate the refreshed partitions
To validate the refresh of the partitions this is optional
import pandas as pd
import json
def get_partition_refreshes(dataset, workspace):
tmsl_data = json.loads(fabric.get_tmsl(dataset=dataset, workspace=workspace))
df = pd.json_normalize(tmsl_data, record_path=['model', 'tables', 'partitions'],
meta=[['model', 'name'], ['model', 'tables', 'name']],
errors='ignore', record_prefix='partition_')
df = df.rename(columns={'model.tables.name': 'table_name'})
return df[['table_name', 'partition_name', 'partition_refreshedTime']]
df = get_partition_refreshes(dataset=dataset, workspace=workspace)
print(df)
I’ve had people ask me why would you need to do the validation in step 5. Thiis is purely optional, but the benefits are:
Ensure Data Accuracy
By checking the refresh status of each partition, you can confirm that the data has been updated correctly. This helps in maintaining the integrity and accuracy of your data model.
Identify Issues Early
If any partitions fail to refresh, you can identify and address these issues promptly. This proactive approach helps in minimizing downtime and ensuring that your reports and dashboards are always up-to-date.
Performance Monitoring
Monitoring the refresh times of partitions can provide insights into the performance of your data model. If certain partitions are taking longer to refresh, you might need to optimize your data model or investigate potential bottlenecks.
Compliance and Auditing
Keeping a record of refresh times and statuses can be important for compliance and auditing purposes. It provides a clear trail of when and how data was updated, which can be useful for regulatory requirements or internal audits.
Incremental Refresh Management
For models using incremental refresh, verifying partition refreshes ensures that only the necessary data is being updated. This can significantly improve the efficiency and speed of the refresh process.
I may look at a dedicated article on Semantic Link to dive into this topic more that looks at more than refreshing a semantic model.
References:
Microsoft Learn: What is semantic link? – Microsoft Fabric | Microsoft Learn
Tabular Model Scripting Language (TMSL) as the name suggests is a scripting method that uses JSON code. Using this method you connect to the semantic model with the XMLA endpoint. There are many tasks you can carry out using TMSL and one of these is refreshing a model. It’s been around for some years and on previous projects I’ve used it where there were bigger enterprise models in a premium capacity (approx 25-30gb), and the model was partitioned based on a transaction date. On a daily basis we only wanted to refresh the current partition that included the transaction dates for the last quarter.
Semantic models are a tabular model, and you can connect to any tabular models with applications like Visual Studio or SQL Server Management Studio (SSMS). I’ve personally used SSMS for TMSL. When using SSMS to connect to a tabular model, in the “server type” use Analysis Services.
To connect to analysis Services with SSMS, connect to your Fabric tenant and go to the workspace. At the top there is workspace setting. In the settings there is a tab called “Licence Info”. On this tab there is a Connection Link with the endpoint details.
Paste the endpoint into the SSMS server name, For authentication click the Microsoft Entra MFA and your user name. This will authenticate and connect to your tabular models.
With TMSL you can do a full refresh or refresh a particular table or partition. This allows for a lot of flexibility with bigger models. Within the Microsoft document in the link below it shows the refresh types with descriptions so that you can set correct type. In the first example below, I’ve used the full refresh. On the other example I’ve used the automatic type.
Sample code to refresh a model:
{
"refresh": {
"type": "full",
"objects": [
{
"database": "AdventureWorksTabular1200"
}
]
}
}
Sample code to refresh one table called Date:
{
"refresh": {
"type": "automatic",
"objects": [
{
"database": "AdventureWorksTabular1200",
"table": "Date"
}
]
}
}
Sample code to refresh one partition within the table FactSales:
{
"refresh": {
"type": "automatic",
"objects": [
{
"database": "AdventureWorksTabular1200",
"table": "FactSales",
"partition": "FactSales"
},
{
"database": "AdventureWorksTabular1200",
"table": "FactSales",
"partition": "FactSales - 2011"
}
]
}
}
TMSL Refresh Types
The code snippets above uses a “Type” in the JSON code. Used the tables below to understand the different refresh types.
Refresh Type | Applies To | Description |
---|---|---|
full | Database, Table, Partition | For all partitions in the specified partition, table, or database, refresh data and recalculate all dependents. For a calculation partition, recalculate the partition and all its dependents. |
clearvalues | Database, Table, Partition | Clear values in this object and all its dependents. |
calculate | Database, Table, Partition | Recalculate this object and all its dependents, but only if needed. This value does not force recalculation, except for volatile formulas. |
dataOnly | Database, Table, Partition | Refresh data in this object and clear all dependents. |
automatic | Database, Table, Partition | If the object needs to be refreshed and recalculated, refresh and recalculate the object and all its dependents. Applies if the partition is in a state other than Ready. |
add | Partition | Append data to this partition and recalculate all dependents. This command is valid only for regular partitions and not for calculation partitions. |
defregment | Database, Table | Defragment the data in the specified table. As data is added to or removed from a table, the dictionaries of each column can become polluted with values that no longer exist in the actual column values. The defragment option will clean up the values in the dictionaries that are no longer used. |
References:
TMSL: Tabular Model Scripting Language (TMSL) Reference | Microsoft Learn
Refresh: Refresh command (TMSL) | Microsoft Learn
Another way to refresh a semantic model using code is with Tabular Object Model (TOM). Refreshing a semantic model using the Tabular Object Model involves using the TOM API to connect to an Analysis Services server, access the model (or database), and trigger a refresh of the necessary objects (tables, partitions, or the entire model). This is commonly done in a .NET environment, such as a C# application. Visual Studio or Visual Studio code connected to your Tabular model would be needed to execute your C# code.
Prerequisites
Steps for Refreshing a Semantic Model
Type of Refresh
Example C# Code
using System;
using Microsoft.AnalysisServices.Tabular;
namespace TabularModelRefresh
{
class Program
{
static void Main(string[] args)
{
// Define the server connection (Azure Analysis Services or SSAS server)
// Azure AS or SSAS connection string
string connectionString = "asazure://<server-region>.asazure.windows.net/<your-server-name>";
string databaseName = "SalesModel"; // Name of the model you want to refresh
// Connect to the server
Server server = new Server();
server.Connect(connectionString);
// Access the tabular database (semantic model)
Database database = server.Databases[databaseName];
if (database == null)
{
Console.WriteLine($"Database '{databaseName}' not found.");
return;
}
// Trigger a full refresh of the entire model
Console.WriteLine("Refreshing the model...");
foreach (Table table in database.Model.Tables)
{
// Request a full refresh for each table
table.RequestRefresh(RefreshType.Full);
}
// Commit the changes and save
database.Model.SaveChanges();
Console.WriteLine("Model refresh initiated.");
// Optionally, process the model to complete the refresh
Console.WriteLine("Processing the model...");
database.Model.RequestRefresh(RefreshType.Full);
database.Model.SaveChanges();
Console.WriteLine("Model processing complete.");
// Disconnect from the server
server.Disconnect();
}
}
}
For those that dont have any expereinec of C# here’s an explanation of the code
Server connection
You establish a connection to the Analysis Services server using the Server object.
For Azure Analysis Services, you’ll use the asazure:// format for the connection string.
Accessing the database (model)
The server.Databases[] indexer is used to retrieve the tabular model (semantic model) by name (databaseName).
Refreshing the model
For each table in the model, the table.RequestRefresh(RefreshType.Full) method is called to trigger a full refresh of the data for that table. This process marks the tables for refresh and updates the metadata, but it doesn’t immediately process the tables.
Processing the model
The database.Model.RequestRefresh(RefreshType.Full) method is used to process the model, ensuring all data is recalculated and up to date.
database.Model.SaveChanges() commits the changes back to the server.
Disconnect
After the operations are complete, the server connection is closed using server.Disconnect().
The Tabular Object Model (TOM) provides fine-grained control over tabular models in Analysis Services, allowing you to programmatically trigger refreshes and perform various model management tasks. This approach is especially useful when you need to embed model management into custom applications or automate the process in complex environments.
References:
Microsoft Learn: Tabular Object Model (TOM) | Microsoft Learn
Within MS Fabric open a data factory pipeline. Inside data factory there are various activities, there is a Semantic Model Refresh activity. At the time of this article the semantic model refresh activity is in public preview.
One of the benefits if using data factory, like with any of the activities the pipeline allows you to manage error handling. Let’s say the semantic model refresh fails, you could for example log the information in a log file or send out an Outlook alert via the Office 365 Outlook activity. Again, the Office 365 Outlook activity is in preview at the time of this article. Any preview activity may change with functionality when it goes into general availability (GA). If this happens, I will update the article accordingly.
These are the options that you can use.
On Success | If the semantic model refresh completes successfully then what happens next. |
On Fail | If the semantic model fails to complete, then actions can be taken. |
On Completion | The On Completion carries out actions regardless of success or failure. |
On Skip | The On Skip allows for an activity to be skipped either conditionally, dependency conditions, or manually. |
In direct query mode data is queried back at the source and not in the virtipaq engine. There are benefits and also limitation to using direct query. Firstly not all sources support direct query. There is also performance limitations. As the data is not queried in the vertipaq engine and any queries need to go back to the source then there is a latency here. If you are not worried by slightly slower query time then this could be an option. As data is not stored in power bi there is not a 10gb limit with a Pro licence. The data remains at the source meaning that larger data is available.
Regarding refreshing of the semantic model as the data is queried at source, any new or updated data is reflected in the reports immediately. So if you have data within the business that changes frequently or you want near real-time reporting then direct query is an option.
In 2018 we had a customer that worked in project management. We built a data warehouse that incrementally updated every 15 minutes. The reports were set up using direct query as they wanted reports to reflect the latest updates to project information. The reports took approx 3-4 second to render which they were happy with if they had the latest information. In this example direct query was perfect but there are instances where other modes would be better. Especially with the introduction of Direct lake.
With semantic models in Microsoft Fabric you have the ability to utilize tables straight from OneLake. This mode takes similar performance benefits to Import, without the latency. So near real-time changes can be reported on similar to direct query. For this reason the develoer does not have to worry about schedules and when the data will be refreshed.
Direct Lake is a storage mode option for tables in a Power BI semantic model that’s stored in a Microsoft Fabric workspace. It’s optimized for large volumes of data that can be quickly loaded into memory from delta tables, which store the data in Parquet files.
Taken from Microsoft information “Direct Lake semantic model is conceptually different to a refresh operation for an Import semantic model. For a Direct Lake semantic model, a refresh involves a framing operation, which can take a few seconds to complete. It’s a low-cost operation where the semantic model analyzes the metadata of the latest version of the Delta tables and is updated to reference the latest files in OneLake”
Direct Lake semantic model is similar to Import mode. That’s because model data is loaded into memory by the VertiPaq engine for fast query performance. Please be aware that direct query fallback may occur and if this is the case then you will not the performance from the vertipaq engine.
For more information on fallback here’s the Microsoft learn document: Direct Lake overview – Microsoft Fabric | Microsoft Learn
There are many ways to refresh a power bi semantic model and each option has its own benefits and area to be cautious about. The refresh strategy needs to be decided on a project by project basis to implement best approach. The most common method of refreshing a semantic model is with a scheduled refresh. There is methods to refresh the model in the ETL pipeline using Data Factory. This will require a Fabric capacity. Then there are coding methods that can refresh a full model, or a single table or partition. This give a lot of flexibility but the developer would need to use JSON or a object oriented language such as C#.
Contact Our Team
If you are interested in having a conversation to see how data analytics, Power BI, or Microsoft Fabric could help your business then Datahub can help. If you are already using one of these technologies and want to be more data focused then What Are The Next Steps!
If you are interested in talking to Datahub about any of these topics then our team of consultants are available to understand any challenges that you maybe facing and how one of our solutions could help.
Contact us: Contact us | DataHub Consulting
Datahub Consulting Website: Data Consultancy Services | Datahub Consulting
We do not employ salespeople; our team are all experienced technical specialists that can talk you through any of our services.
Contact us