Automating how to Pause and Resume multiple Microsoft Fabric Capacities: An Enterprise-Level Solution

Data Insight Nest

Feb 1310 min read

Before diving into the core of this blog, I want to give credit and a shout-out to Soheil Bakhshi who inspired this post - he has done an amazing series on YouTube showing how to automate the pausing and resuming of Microsoft Fabric Capacities with Logic Apps. I highly recommend going to give him a follow on both LinkedIn and YouTube! Soheil covered all the details explaining the how, why, and most importantly, the benefits of implementing an automated solution like this. So, I won’t go into the details of deploying the Logic Apps or explaining these topics in depth (check out Soheil's content for that). But in essence, Microsoft Fabric follows a consumption-based pricing model, meaning you're charged for capacity whether it’s in use or not.. By automating the pause and resume functionality, you optimise costs, reduce manual effort, and provide flexibility by scheduling capacity availability around business hours or demand.

In this blog, I will expand on Soheil’s solution by covering what to do when you have multiple capacities spanning potentially multiple subscriptions in your organisation.

Image illutrating cost saving after using automation

Prerequisites

At least one active Microsoft Fabric Capacity
An Azure subscription with permission to create and manage Logic Apps
Logic Apps have system-assigned managed identity enabled with the Fabric Administrator permission

The Scenario: Managing Multiple Microsoft Fabric Capacities

So, let’s get into it. The scenario we’re tackling involves managing two or more capacities. For the sake of this example, I’ll use the following capacity requirements:

Production capacities need to be running/online every day between 8 AM and 8 PM.
Non-Production capacities need to be running/online every weekday (Monday - Friday) between 8 AM and 5 PM.

To achieve this requirement I will use two Logic Apps:

A Master/Orchestrator Logic App - Responsible for scheduling and orchestrating the ‘Worker’ Logic App
A worker Logic App - Responsible for performing the pause/resume operations on the Microsoft Fabric Capacity.

As you can imagine, the more ‘complex’ Logic App will be the Master Logic App, so let’s start with the Worker Logic App first.

Step 1: Create the 'Worker' Logic App

Jump to Code View

The Worker Logic App will be triggered by the Master Logic App via a POST request. Depending on the parameters passed in the body of the request, it will either suspend/pause or resume/start a given capacity.

Here’s how to set it up:

Trigger: "When an HTTP request is received"

Method: POST
Request Body Schema (see code snippet below):

{
    "type": "object",
    "properties": {
        "action": {
            "type": "string"
        },
        "capacityId": {
            "type": "string"
        },
        "apiVersion": {
            "type": "string"
        }
    }
}

These parameters enable dynamic changes, allowing you to:

Choose an action (Suspend/Resume).
Specify which capacity to target (capacityId).
Define the API version (apiVersion), which may vary depending on the action or if a version is deprecated.

The reason we include the API version is that in some cases, it may need to change depending on the action you need to perform, or in the slim chance an API version becomes deprecated.

HTTP Action: Call the relevant Microsoft Fabric API The action will use the dynamic parameters to construct the Microsoft Fabric API request:

concat('https://management.azure.com', triggerBody()?['capacityId'], '/', triggerBody()?['action'], '?api-version=', triggerBody()?['apiVersion'])

I have include a link to the documentation for the available Microsoft Fabric APIs.

'Worker' Logic App Code View

Code

Screenshot of the worker Logic App from desiner — This is what the Worker Logic App should look like in the designer

Step 2: Create the 'Master' Logic App

Jump to Code View

The Master Logic App will hold the majority of the logic to fulfill the capacity scheduling requirements. Here's a breakdown of the desired flow:

Defining the desired schedule - We want to minimise the number of times the Logic App needs to run per day. In our case, it will run three times per day: 8 AM, 5 PM, and 8 PM.
Get the list of capacities in each subscription.
At 8 AM on a Weekday - Start all capacities that are paused.
At 8 AM on a Weekend - Start all capacities that contain ‘prod’ in the name.
At 5 PM on a Weekday - Stop all non-production capacities that are running.
At 8 PM - Stop all running capacities.

'Master' Logic App Code View

Code

At first, this might look overwhelming and intimidating. Let’s break it down:

1. Recurrence Trigger

The Logic App is scheduled to run at specific times throughout the day, ensuring that the pausing and resuming actions occur automatically based on the time of day. The trigger is configured to run at 8:00 AM, 5:00 PM, and 8:00 PM, New Zealand Standard Time (NZST), with a simple recurrence pattern that ensures the Logic App checks the status of the capacities at these key points.

2. Initialising Variables

Before we can begin the main actions, we initialise several variables needed for the Logic App's operations. These include:

Subscription ID List: A list of subscriptions associated with your Microsoft Fabric resources.
API Version: A variable to store the API version (2023-11-01) used for making requests to the Microsoft Fabric API.
Day of the Week: This variable helps determine if the day is a weekend or a weekday. It ensures that actions are only performed on weekdays, preventing unnecessary capacity management on non-working days.
Get Time of Day: The key feature of this Logic App is its ability to manage capacities based on the time of day. Using the Get Time of Day action, we check the current time and decide whether to pause or resume capacities.

3. Retrieving the Capacity List

Before we can begin we need to first retrieve a list of all available capacities across the different subscriptions. To do this we used the variable we initialised above (Subscription ID List) and loop through each item in the list. We then use eash Subscription Id to query the Microsoft Fabric API to fetch a list of capacities in that Subscription using the GET request to the appropriate endpoint.

API Request: A GET request is made to the Microsoft Fabric API to pull all capacity information. This response contains a list of all capacities across the specified subscription.
Capacity Data: The data retrieved includes key information such as capacity ID, environment type (production or non-production), status (active or paused), and other relevant attributes.

4. Filtering Capacities

Now that we have the list of capacities for the subscription, we use the Filter Array action to separate production (PROD) and non-production capacities. Capacities are filtered based on whether their ID contains the string "prod" (for production environments) or not (for non-production environments). This results in three distinct lists of capacities:

List of Production Capacities
List of Non-Production Capacities
List of All

Having these three lists will allows to pause/resume each list depending on the time of day and week.

5 and 6. Pause or Resume Capacities based on Conditional Logic

Now using all the items such as time of day, day of week, and the variuos lists of capacities, we can go through the variuos conditions to pause/resume the correct capacities:

8:00 AM (Weekdays - Mon-Fri): At the start of the workday, all environments are resumed if they are currently paused, ensuring that both production and non-production environments are available for the day’s work.
8:00 AM (Weekends - Sat-Sun): At the start of the weekend, only Production (PROD) environments are resumed if they are currently paused, as these are critical and must remain available during the weekend.
5:00 PM (Weekdays - Mon-Fri): The app checks if non-production environments are still active and pauses them. This helps to conserve resources after working hours.
8:00 PM (Daily): The app checks again to see if any active production environments are still running. If they are, the app pauses them to optimise resource usage during non-working hours.

The last action of either of these logic trains will be call the Worker Logic app to perform the relevant action.

Additional Considerations

Every organisation has unique requirements for scheduling capacity pauses and resumes. This solution provides flexibility to tailor the automation to your needs. Some other considerations that might be worth while exploring and implemnenting are:

User-defined schedules: Store custom schedules in a Logic Apps parameter file, Azure Table Storage, or a SharePoint list to allow easy modification without altering the Logic App itself.
Adaptive scheduling: Use Azure Functions to dynamically adjust pause/resume times based on real-time usage patterns.
Names of capacities are not always reliable to determine whether or not they are PROD/Non-PROD or similar - Introduce a tagging system in Azure, where tags (e.g., always-on=true) dictate whether a capacity should be excluded from pausing. A step further might be to a configuration file (e.g., JSON in Azure Storage) that allows the organisation to define exceptions on a per-capacity basis.
Developing a Teams bot that allows users to query or adjust capacity statuses via simple chat commands.

Summary

And there you have it - a solution that enables your organisation to efficiently manage multiple Microsoft Fabric capacities. In essence adopting a solution like this will:

Provide cost savings by ensuring capacities run only when needed.
Reduced operational overhead with automated scheduling.
Enable dynamic scheduling for flexibility.

I’d love to hear your thoughts - how have you tackled similar requirements? Have you implemented a similar solution, and what kind of cost savings have you seen?