A while ago, we wrote a blog post about SQL Server in Azure, IO throughput, and issues with Availability Groups, titled Availability groups in Azure – when always on is sometimes off. In that article, Kevin writes, near the end:
…reducing the IO requirements of the platform can help. There are options built into the platform that can assist (appropriate disk caching options are one such solution), but as with any tuning work this won’t be a quick operation. For a SQL Server workload there is an additional option – you can use Resource Governor and MAX_IOPS_PER_VOLUME to directly control your IO and prevent this.
Since then, we have had a case where we have had to implement this for a system maintenance workload, so here’s a quick overview of what we did. This post is also a worked introduction to SQL Server Resource Governor.
The Client’s Problem
The Client has a 24×7 environment running in Azure. Like so many companies, they would like to run various database maintenance tasks (re-indexing, database consistency checks) during the quieter hours – however, their quiet hours are not quiet enough to run these IO & CPU intensive processes without affecting performance for their customers.
Our aim, therefore, was to reduce the impact of those processes so that maintenance would still occur, but without adversely affect the end customer experience.
Enter SQL Server Resource Governor.
Resource Governor – The Basics
Resource Governor is, in effect, a gatekeeping resource-allocating process. It is triggered when a process logs into a server, and a classifier function looks at certain attributes of the incoming process (such as who’s running it, from where, and what application), and uses that information to grant it access to a certain fraction of SQL Server’s resources (such as CPU, memory, IO). Anything that’s not matched by that classifier goes to the “default” pool – which has unfettered access.
Because SQL Server typically runs a variety of workloads (small transactions, big reports, etc), we can configure multiple workload groups to access pre-defined limited resource pools. Be aware that multiple workload groups can be configured to use the same resource pool.
In short, an application connection is assigned by the classifier function to a workload group that allows access to a pool of resources.
And Here’s How We Did It
First, create a resource pool
We need to restrict the pool to 20% CPU, 20% memory, and 100 IOPs:
Note that we are not defining the max_cpu_percent, as that only applies when the system is busy. We are instead imposing a hard cap on CPU resources used by this pool.
Reminder: you can have more than one resource pool.
Create a work group to use that pool
Note that we are further requesting the amount of memory that can be used by each request in this pool, in case we are running on a large instance and multiple maintenance tasks simultaneously.
Reminder: you can have more than one workload group; however, each workload group can be assigned to only one resource pool. Multiple workload groups can be assigned to the same pool.
Create a classifier function
This is the tricky part. As the thing we’re aiming to restrict is a SQL Agent scheduled task, the application name exposed to SQL Server isn’t a straightforward name but something like SQLAgent – TSQL JobStep (Job 0x….. : Step z).
This means we have two options:
- Calculating it (or copying it from sys.dm_exec_sessions) ahead of time, and hoping it doesn’t change ID, or
- Calculating the appropriate application name each time
The second of those is not an easy option. To calculate the application name from msdb’s sysjobs & sysjobsteps tables requires a conversion from the job_id field (a uniqueidentifier) to the binary representation as presented in the application name. That’s the easy part. There’s a more serious problem in that the classifier function must be schema bound, and that means it can only reference other objects by two-part name (schema_name.object_name); and we can’t get round that by using synonyms.
So here’s the first version, using an ID captured from sys.dm_exec_sessions (or, more likely, from the output from sp_whoisactive).
Reminder: you can have only one classifier function.
Configure Resource Governor to use the function
Finally (for the configuration process) we need to attach the classifier function we have just created to the Resource Governor, and then run the reconfiguration command to get SQL Server to recognise the changes.
Finally, check your work
Given that Resource Governor pools are only allocated when a client application connects to the SQL Server, these changes will only be applied to connections made after running the reconfigure statement above. To see which sessions are using user-defined resource pools, use the following query:
Note that groups 1 & 2 are reserved for the system-defined resource pools “internal” and “default”.
Present Day
The client is now happy that these essential database maintenance activities are being performed without impacting on the customers.
But I need more flexibility
As the reminders above indicate, you can have multiple resource pools and multiple workload groups; but you can only have one classifier function.
There’s a more complex example in Microsoft’s Tutorial on Resource governor configuration examples and best practices. (That link goes directly to the section “Use multiple resource pools and workload groups”.)
Exercises for the reader
How would you rewrite the classifier to cope with potential changes of job ID? For example, if this configuration is rolled out across many servers, the JobID and thus the app_name won’t be the same across the whole estate.
How would you rewrite the classifier to cope with multiple jobs? We have some clients that use Ola Hallengren’s “DatabasesInParallel” parameter for index and integrity check work.