THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Merrill Aldrich

Auto-tuning memory configuration on a cluster

I was inspired by Aaron's post on a similar topic a while back to dig deeper into how I could automagically adjust the memory allocation for multiple instances of SQL Server in a cluster-failover scenario. So, suppose you have:

  1. A two-node cluster
  2. Four SQL Server instances, two intended to run on each physical server (call it "deluxe active-active")
  3. An environment where it's feasible to run all four instances on one node, in case of an emergency or in a maintenance window
  4. An expectation that some performance degredation is acceptable in the emergency scenario, and you want to take full advantage of your hardware investment when both nodes are up and running
  5. Stuff running that is tolerant of reconfiguration of max memory in SQL Server "on the fly"

A typical deployment would have each SQL Server instance's memory limited to about 1/4 the usable RAM on one node (that is, 1/4 the RAM remaining after the overhead required for OS and processes outside what is covered by SQL Server's max memory configuration) so that in a failover, their memory footprint, collectively, would not overwhelm one physical server. This is certainly safe, but you also have to buy a whole lot of extra memory that will sit idle practically all the time.

Instead, I wondered if I could set each instance to use half the usable RAM on each physical host, but in the event of a failover, automatically adjust the max memory setting on each instance to an approriate value for them to share a node and still be "happy." This would be a bit like a new person bellying up to a crowded bar: some of the other customers have to "move over" - have their RAM allocation reduced - to make space.

In pseudo-code, I want to:

  • Detect that a failover has occurred
  • Check the max memory setting on each instance on the cluster against a reasonable value for where they are running
  • If needed, change the max memory setting on each instance, to prevent the total memory allocated from exceeding the usable memory on each host

I'd been mulling this over during the past few months as we design and test a SQL Server consolidation / data center move at work. I finally settled on a design like this:

  • Create a PowerShell script that can check and set the max memory value for all instances on a cluster
  • Create a SQL Server Agent job to call the PowerShell script on Agent startup, running under a proxy account that has access to the other instances

Disclaimer I don't know that this is a great idea, I only have it in test at this point. Your mileage may vary. Never download and use scripts from the internet. If you experience dizziness, nausea or you turn purple or your hair falls out while using this script, see your doctor right away. Some people taking this script have reported dry mouth, memory loss or profuse bleeding from the nose. All bananas shall be placed on top of refrigerators for ripening.

So, the PowerShell script starts with a function to go get the memory configuration and host name for a running SQL Instance:

function Get-SQLInstanceConfig( [string]$SQLInstance, [ref]$hostName, [ref]$maxServerMemory ) {

    # Function to establish a connection to a clustered SQL Server instance, 
    # read max server memory configuration value and current physical host name
    
    $con = New-Object System.Data.SqlClient.SqlConnection( `
        "Data Source=$SQLInstance;Initial Catalog=master;Integrated Security=SSPI;")
    $q = "SELECT ( SELECT serverproperty('ComputerNamePhysicalNetbios') ), " `
        + "value_in_use FROM sys.configurations " `
        + "WHERE Name = 'max server memory (MB)';"
    $cmd = New-Object System.Data.SqlClient.SqlCommand( $q, $con )

    $con.open()
    $reader = $cmd.ExecuteReader()
    if( $reader.read() ) { 
        $hostName.Value = $reader.GetValue(0) 
        $maxServerMemory.Value = $reader.GetValue(1) 
    }
    $reader.close()
    $con.close()
}

Next, we need a function that can change the memory setting for an instance if we find that it's incorrect:

function Set-SQLInstanceMemory( [string]$SQLInstanceName, [int]$maxMemSetting ) {

    # Function to set max server memory on a given SQL instance
    
    write-host "   Reconfiguring" $SQLInstanceName to $maxMemSetting
    
    $con = New-Object System.Data.SqlClient.SqlConnection( `
        "Data Source=$SQLInstanceName;Initial Catalog=master;Integrated Security=SSPI;")
    $q = "EXEC sys.sp_configure N'show advanced options', N'1'; " `
        + "RECONFIGURE WITH OVERRIDE; " `
        + "EXEC sys.sp_configure N'max server memory (MB)', N'" + $maxMemSetting + "'; " `
        + "RECONFIGURE WITH OVERRIDE; " `
        + "EXEC sys.sp_configure N'show advanced options', N'0'; " `
        + "RECONFIGURE WITH OVERRIDE; " 
    $cmd = New-Object System.Data.SqlClient.SqlCommand( $q, $con )

    $con.open()
    $cmd.executeNonQuery()
    $con.close()
}

Having established those two functions, it's then a matter of "walking" all the SQL instances to check their currently running values, then correcting any that are inappropriate for the location of instances on cluster node one or two. I need a couple of containers to store the results and make decisions about what to change:

# List the instances on the cluster:

$SQLInstances = @( "SQL01", "SQL02", "SQL03", "SQL04" )

# Make a hashtable to sort out which SQL instance is running on which host, 
# with how much memory. One table entry for each host, with each entry containing
# an empty hash table we will load with SQL instance configuration info later:

$hostsTable = @{ "NODE1HOST" = @{}; "NODE2HOST" = @{} }

# Allowed memory for SQL Server = total server memory (e.g. 16 GB), less 3 GB to leave for OS 
# and SQL tasks outside the SQL configured boundary:

$aggregateMemory = ( 16 - 3 ) * 1024

Working from those variables, we can then use two loops, one to collect the current state from each instance, and a second to validate/correct the values based on what instance is running on what node:

# Get the configs for each SQL Server instance

foreach( $SQLInstance in $SQLInstances ) {

    $hostName = $null
    $maxServerMemory = $null
    
    Get-SQLInstanceConfig $SQLInstance ([REF]$hostName) ([REF]$maxServerMemory) 

    # If we really can't see one of the instances, it's best to bail at this point,
    # rather than reconfiguring memory settings without complete information 
    
    if( $hostName -eq $null ) { 
        throw ( "Could not connect to one of the SQL instances on the cluster." )
    }	
    
    # Put the current SQLInstance and its max memory value into the right slot 
    # in the hosts table, to classify the instance by host
    
    $hostsTable[$hostName].Add( $SQLInstance, $maxServerMemory )
}

# For each physical cluster node, calculate a reasonable memory limit 
# per SQL instance, then verify or correct the max memory value for  
# each SQL Server instance running on that host

foreach( $hostEntry in $hostsTable.getEnumerator() ) {

    write-host 
    write-host $hostEntry.Name
    write-host "   Num SQL instances on this host:" $hostEntry.Value.count
    
    if( $hostEntry.Value.count -gt 1 ) { 
        [int]$maxMemSetting = $aggregateMemory / $hostEntry.Value.count 
    } else { 
        [int]$maxMemSetting = $aggregateMemory / 2 
    }
    
    write-host "   Max memory allowed per SQL instance:" $maxMemSetting
    write-host 
    
    foreach( $SQLInstanceEntry in $hostEntry.Value.getEnumerator() ) {
    
        write-host "  " $SQLInstanceEntry.Name is set to $SQLInstanceEntry.Value 

        if ( $SQLInstanceEntry.Value -ne $maxMemSetting ) {
            Set-SQLInstanceMemory $SQLInstanceEntry.Name $maxMemSetting
        }
    }
}

In implementing this on my test cluster, I went to each SQL Instance, and added a proxy account and then a SQL agent job to call this PowerShell script. The script had to be copied into an identical location on the physical disks in each cluster node, like the SQL Server binaries, so that it could be run by any SQL Agent, regardless of which node the agent is running on.

I then set the schedule for the jobs to execute once on SQL Agent start up.

Lastly, I reasoned that at least one common scenario could make this process fail (among other things): when a four-instance/two node cluster has a node "croak," the other node will attempt to bring two instances online as quickly as possible. If one failed SQL instance comes up before the other one, and this script runs immediately, the script will fail to connect to the second failed SQL instance, and if allowed to complete would set incorrect values. When the second node came up, then it would call the script again and correct the problem, but you've then reconfigured all the instances twice. It's a race condition, but in order to try to get this right most of the time, I put a "wait" step in each of these agent jobs, so that the agent comes up, the job fires, then there's a delay to give the other instances time to come online before the script runs.

As extra insurance, I also trap for a failed connection in the script and have it terminate. That way, if the agent runs it and one of the instances is still down, it should have no effect. When the down instance comes online, then it'll execute the script and succeed.

So far in testing, this seems to work OK. The main concern I have, with what I know about our workload and applications, isn't that we'll have a problem running on one node in an emergency, it's that there could be some side effect from changing max server memory, on the fly, while the instances are running.

I'd welcome comments or suggestions, if you've tried this or perhaps see a hole in my logic.

Published Friday, January 22, 2010 10:13 AM by merrillaldrich

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Ben Thul said:

SMO is your friend (and mine).  Instead of running data queries to get/set the max server memory, use something like (off the cuff; not tested)

$srv = Microsoft.SqlServer.Management.Smo.Server("your server name here")

$config = $srv.Configuration

$config.MaxServerMemory = 12345

$config.Alter()

January 22, 2010 4:39 PM
 

merrillaldrich said:

Great tip

Playing with this, the syntax seems to be close to that:

[reflection.assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo")

$srv = new-object Microsoft.SQLServer.Management.Smo.Server("SERVER\INSTANCE")

$srv.Configuration.MaxServerMemory.RunValue # gets the current value

$srv.Configuration.MaxServerMemory.ConfigValue = 4096 # or whatever

$srv.Configuration.Alter() # applies the ConfigValue change

January 22, 2010 7:04 PM
 

merrillaldrich said:

What's interesting (but not suprising :-) is that if I profiler-trace this statement:

$srv.Configuration.MaxServerMemory.RunValue

SMO does this to fetch the data:

SELECT

cfg.name AS [Name],

cfg.configuration_id AS [Number],

cfg.minimum AS [Minimum],

cfg.maximum AS [Maximum],

cfg.is_dynamic AS [Dynamic],

cfg.is_advanced AS [Advanced],

cfg.value AS [ConfigValue],

cfg.value_in_use AS [RunValue],

cfg.description AS [Description]

FROM

sys.configurations AS cfg

and if I trace this one:

$srv.Configuration.Alter()

Hey presto!:

EXEC sys.sp_configure N'show advanced options', N'1';

RECONFIGURE WITH OVERRIDE;

EXEC sys.sp_configure N'max server memory (MB)', N'1024';

RECONFIGURE WITH OVERRIDE;

EXEC sys.sp_configure N'show advanced options', N'0';

RECONFIGURE WITH OVERRIDE;

Just proves there's really no voodoo magic.

SMO would help clean up the script, though, removing the need to construct ugly/specific query strings...

January 22, 2010 7:24 PM
 

Ben Thul said:

I never professed magic.  But cleaner code is nice...:)

January 23, 2010 12:17 AM
 

merrillaldrich said:

No doubt!

January 23, 2010 12:32 AM
 

MDeVore said:

Figured out how to do this using sql code. If you want to see how, see here: http://www.sqlservercentral.com/Forums/FindPost768793.aspx.

February 1, 2010 10:55 AM
 

merrillaldrich said:

Nice!

February 1, 2010 2:16 PM
 

Shoeless_Mike said:

I re-invented the wheel here - but I did not find this post until I was well into coding. Here is my version - it is not as elegant as the others but I have a different set of circumstances. I manage 3 cluster installs, one with 2 servers total, one with 4 and one with 6. I had to do some hard-coding to accomodate for the varying numbers of running instances etc. Anyways here it is - all powershell:

[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | Out-Null

$local = hostname

#Set srvcount so we can compare it to $running_srv_cnt, the actual # of running sql instances. Also set $sqlgroup for each env, an array that holds all of the service names for each instance.  

if ($local -like "*dev*")

{$srvcount = 2;$sqlgroup = "MSSQL`$`DEV", "MSSQL`$`DEVDW", "MSSQL`$`SANDBOX", "MSSQL`$`SANDBOXDW"}

elseif ($local -like "*test*")

{$srvcount = 3;$sqlgroup = "MSSQL`$`AIPTESTDW","MSSQL`$`QADW","MSSQL`$`UATDW","MSSQL`$`AIPTEST","MSSQL`$`TESTAPPSQLVMA","MSSQL`$`UAT"}

else {$srvcount = 1;$sqlgroup = "MSSQL`$`PROD", "MSSQL`$`PRODDW"}

#Set maxmem to 50000 so we always leave ~14gb for the OS. Maxmem should be close to ($maxmem/$running_srv_cnt)

$maxmem = 50000

$running_srv_cnt = (Get-Service * | where {$_.Name -like '*MSSQL$*'} | where {$_.Status -eq 'Running'}).Length

#Here is where we determine if the # of running instances exceeds the expected # and change maxmem accordingly.

if ($running_srv_cnt -le $srvcount)

{echo "No change to memory settings required."; exit}

elseif ($running_srv_cnt -gt $srvcount)

{echo "Failover condition exists...";$new_maxmem = $maxmem / $running_srv_cnt}

else {echo "No change to memory settings required."; exit}

#The Get-Service cmdlet gets a list of running sql server service names and redirects them to a file.

Get-Service * | where {$_.Name -like '*MSSQL$*'} | where {$_.Status -eq 'Running'} > C:\temp\running_servs.txt

#We parse through the list of running services using each element of the $sqlgroup array as the search pattern.

#When a match is found, i.e. a member of the $sqlgroup is found in the file, we set the $servername variable

#and then output the $servername value to another file. This file will contain the sql server instance names

#for running instances.

foreach($x in $sqlgroup)

{$ret = Get-Content C:\temp\running_servs.txt | Select-string -SimpleMatch  -Pattern $x -quiet; if ($ret) {

Switch ($x)

{

MSSQL`$`DEV {$servname = "devappsqla\dev"}

MSSQL`$`DEVDW {$servname = "devappsqlb\devdw" }

MSSQL`$`SANDBOX {$servname = "devappsqlc\sandbox"}

MSSQL`$`SANDBOXDW {$servname = "devappsqld\sandboxdw"}

MSSQL`$`TESTAPPSQLVMA {$servname = "testappsqla\testappsqlvma"}

MSSQL`$`QADW {$servname = "testappsqlb\qadw"}

MSSQL`$`AIPTEST {$servname = "testappsqlc\aiptest"}

MSSQL`$`AIPTESTDW {$servname = "testappsqld\aiptestdw"}

MSSQL`$`UAT {$servname = "testappsqle\uat"}

MSSQL`$`UATDW {$servname = "testappsqlf\uatdw"}

MSSQL`$`PROD {$servname = "appsqla\prod"}

MSSQL`$`PRODDW {$servname = "appsqlb\proddw"}

default {$x = "undetermined"}

}

echo $servname >> C:\temp\running_servnames.txt}; $ret = $null}

$servlist = Get-Content  C:\temp\running_servnames.txt

#Once we have the $servlist built we can use it to change maxmem for each running server.

foreach ($x in $servlist)

{

$srv = new-object ("Microsoft.SqlServer.Management.Smo.Server") $x

$config = $srv.Configuration

$config.ShowAdvancedOptions.ConfigValue = 1

$config.MaxServerMemory.ConfigValue = $new_maxmem

$config.Alter()

}

$emailFrom = "SABDatabase@rma.usda.gov"

$emailTo = "SABDatabase@rma.usda.gov"

$subject = "Cluster Failover Memory Alert"

$body = "MAXMEM was adjusted on all SQL Server instances to $new_maxmem MB"

$smtpServer = "mail.rma.usda.gov"

$smtp = new-object Net.Mail.SmtpClient($smtpServer)

$smtp.Send($emailFrom, $emailTo, $subject, $body)

#cleanup

del C:\temp\running_servnames.txt

del C:\temp\running_servs.txt

November 10, 2010 11:09 AM
 

Shoeless_Mike said:

Updated version :-) The earlier one had a few bugs...

[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO') | Out-Null

function cleanup()

{

del C:\temp\running_servnames.txt

del C:\temp\running_servs.txt

exit

}

$local = hostname

#Set srvcount so we can compare it to $running_srv_cnt, the actual # of running sql instances. Also set $sqlgroup for each env, an array that holds all of the service names for each instance.  

if ($local -like "*dev*")

{$srvcount = 2;$sqlgroup = "MSSQL`$`DEV", "MSSQL`$`DEVDW", "MSSQL`$`SANDBOX", "MSSQL`$`SANDBOXDW";$expected_maxmem = 25000}

elseif ($local -like "*test*")

{$srvcount = 3;$sqlgroup = "MSSQL`$`AIPTESTDW","MSSQL`$`QADW","MSSQL`$`UATDW","MSSQL`$`AIPTEST","MSSQL`$`TESTAPPSQLVMA","MSSQL`$`UAT"; $expected_maxmem = 20000}

else {$srvcount = 1;$sqlgroup = "MSSQL`$`PROD", "MSSQL`$`PRODDW"; $expected_maxmem = 50000}

#Set maxmem to 50000 so we always leave ~14gb for the OS. Maxmem should be close to ($maxmem/$running_srv_cnt)

$maxmem = 50000

$running_srv_cnt = (Get-Service * | where {$_.Name -like '*MSSQL$*'} | where {$_.Status -eq 'Running'}).Length

#Here is where we determine if the # of running instances exceeds the expected # and change maxmem accordingly.

if ($running_srv_cnt -lt $srvcount)

{echo "Less than the expected number of sql server instances are running - leaving maxmem alone.";exit}

elseif ($running_srv_cnt -gt $srvcount)

{echo "Failover condition exists... will set maxmem appropriately"; $new_maxmem = $maxmem / $running_srv_cnt}

else {echo "Expected number of sql server instances are running - verifying maxmem..."}

#The Get-Service cmdlet gets a list of running sql server service names and redirects them to a file.

Get-Service * | where {$_.Name -like '*MSSQL$*'} | where {$_.Status -eq 'Running'} > C:\temp\running_servs.txt

#We parse through the list of running services using each element of the $sqlgroup array as the search pattern.

#When a match is found, i.e. a member of the $sqlgroup is found in the file, we set the $servername variable

#and then output the $servername value to another file. This file will contain the sql server instance names

#for running instances.

foreach($x in $sqlgroup)

{$ret = Get-Content C:\temp\running_servs.txt | Select-string -SimpleMatch  -Pattern $x -quiet; if ($ret) {

Switch ($x)

{

MSSQL`$`DEV {$servname = "devappsqla\dev"}

MSSQL`$`DEVDW {$servname = "devappsqlb\devdw" }

MSSQL`$`SANDBOX {$servname = "devappsqlc\sandbox"}

MSSQL`$`SANDBOXDW {$servname = "devappsqld\sandboxdw"}

MSSQL`$`TESTAPPSQLVMA {$servname = "testappsqla\testappsqlvma"}

MSSQL`$`QADW {$servname = "testappsqlb\qadw"}

MSSQL`$`AIPTEST {$servname = "testappsqlc\aiptest"}

MSSQL`$`AIPTESTDW {$servname = "testappsqld\aiptestdw"}

MSSQL`$`UAT {$servname = "testappsqle\uat"}

MSSQL`$`UATDW {$servname = "testappsqlf\uatdw"}

MSSQL`$`PROD {$servname = "appsqla\prod"}

MSSQL`$`PRODDW {$servname = "appsqlb\proddw"}

default {$x = "undetermined"}

}

echo $servname >> C:\temp\running_servnames.txt}; $ret = $null}

$servlist = Get-Content  C:\temp\running_servnames.txt

#Once we have the $servlist built we can use it to change maxmem for each running server.

if ($running_srv_cnt -gt $srvcount)

{

foreach ($x in $servlist)

{

$srv = new-object ("Microsoft.SqlServer.Management.Smo.Server") $x

$config = $srv.Configuration

$running_maxmem = $config.MaxServerMemory.ConfigValue

if ($running_maxmem -eq $maxmem/$running_srv_cnt)

{echo "Memory has already been adjusted on $x to $running_maxmem"}

else

{

$config.ShowAdvancedOptions.ConfigValue = 1

$config.MaxServerMemory.ConfigValue = $new_maxmem

$config.Alter()

}

}

}

elseif ($running_srv_cnt -eq $srvcount)

{

foreach ($x in $servlist)

{

$srv = new-object ("Microsoft.SqlServer.Management.Smo.Server") $x

$config = $srv.Configuration

$running_maxmem = $config.MaxServerMemory.ConfigValue

if ($running_maxmem -eq $expected_maxmem)

{echo "Memory setting on $x is correct at $running_maxmem"}

elseif ($running_maxmem -gt $expected_maxmem){echo "Memory setting of $running_maxmem is incorrect - setting to $expected_maxmem";

$config.ShowAdvancedOptions.ConfigValue = 1

$config.MaxServerMemory.ConfigValue = $expected_maxmem

$config.Alter(); echo "Maxmem successfully adjusted on $x to $expected_maxmem"}

}

}

else {echo "poop"}

cleanup

November 11, 2010 1:27 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement