SQL Server自动化运维系列——监控性能指标脚本(Power Shell)
2015-05-04来源:易贤网

需求描述

一般在生产环境中,有时候需要自动的检测指标值状态,如果发生异常,需要提前预警的,比如发邮件告知,本篇就介绍如果通过Power shell实现状态值监控。

监控值范围

根据经验,作为DBA一般需要监控如下系统能行指标。

cpu:

Processor(_Total)% Processor Time

Processor(_Total)% Privileged Time

SQLServer:SQL StatisticsBatch Requests/sec

SQLServer:SQL StatisticsSQL Compilations/sec

SQLServer:SQL StatisticsSQL Re-Compilations/sec

SystemProcessor Queue Length

SystemContext Switches/sec

Memory:

MemoryAvailable Bytes

MemoryPages/sec

MemoryPage Faults/sec

MemoryPages Input/sec

MemoryPages Output/sec

Process(sqlservr)Private Bytes

SQLServer:Buffer ManagerBuffer cache hit ratio

SQLServer:Buffer ManagerPage life expectancy

SQLServer:Buffer ManagerLazy writes/sec

SQLServer:Memory ManagerMemory Grants Pending

SQLServer:Memory ManagerTarget Server Memory (KB)

SQLServer:Memory ManagerTotal Server Memory (KB)

Disk:

PhysicalDisk(_Total)% Disk Time

PhysicalDisk(_Total)Current Disk Queue Length

PhysicalDisk(_Total)Avg. Disk Queue Length

PhysicalDisk(_Total)Disk Transfers/sec

PhysicalDisk(_Total)Disk Bytes/sec

PhysicalDisk(_Total)Avg. Disk sec/Read

PhysicalDisk(_Total)Avg. Disk sec/Write

SQL Server:

SQLServer:Access MethodsFreeSpace Scans/sec

SQLServer:Access MethodsFull Scans/sec

SQLServer:Access MethodsTable Lock Escalations/sec

SQLServer:Access MethodsWorktables Created/sec

SQLServer:General StatisticsProcesses blocked

SQLServer:General StatisticsUser Connections

SQLServer:LatchesTotal Latch Wait Time (ms)

SQLServer:Locks(_Total)Lock Timeouts (timeout > 0)/sec

SQLServer:Locks(_Total)Lock Wait Time (ms)

SQLServer:Locks(_Total)Number of Deadlocks/sec

SQLServer:SQL StatisticsBatch Requests/sec

SQLServer:SQL StatisticsSQL Re-Compilations/sec

上述指标含义,可以参照我上一篇文章:SQL Server需要监控哪些计数器

监控脚本

$server = "(local)"

$uid = "sa"

$db="master"

$pwd="password"

$mailprfname = "SendEmail"

$recipients = ""

$subject = "数据库指标异常了!"

$computernamexml = "f:computername.xml"

$alter_cpuxml = "f:alter_cpu.xml"

function GetServerName($xmlpath)

{

$xml = [xml] (Get-Content $xmlpath)

$return = New-Object Collections.Generic.List[string]

for($i = 0;$i -lt $xml.computernames.ChildNodes.Count;$i++)

{

if ( $xml.computernames.ChildNodes.Count -eq 1)

{

$cp = [string]$xml.computernames.computername

}

else

{

$cp = [string]$xml.computernames.computername[$i]

}

$return.Add($cp.Trim())

}

$return

}

function GetAlterCounter($xmlpath)

{

$xml = [xml] (Get-Content $xmlpath)

$return = New-Object Collections.Generic.List[string]

$list = $xml.counters.Counter

$list

}

function CreateAlter($message)

{

$SqlConnection = New-Object System.Data.SqlClient.SqlConnection

$CnnString ="Server = $server; Database = $db;User Id = $uid; Password = $pwd"

$SqlConnection.ConnectionString = $CnnString

$CC = $SqlConnection.CreateCommand();

if (-not ($SqlConnection.State -like "Open")) { $SqlConnection.Open() }

$cc.CommandText=" EXEC msdb..sp_send_dbmail

@profile_name = '$mailprfname'

,@recipients = '$recipients'

,@body = '$message'

,@subject = '$subject'

"

$cc.ExecuteNonQuery()|out-null

$SqlConnection.Close();

}

$names = GetServerName($computernamexml)

$pfcounters = GetAlterCounter($alter_cpuxml)

foreach($cp in $names)

{

$p = New-Object Collections.Generic.List[string]

$report = ""

foreach ($pfc in $pfcounters)

{

$b = ""

$counter ="\"+$cp+$pfc.get_InnerText().Trim()

$p.Add($counter)

}

$count = Get-Counter $p

for ($i = 0; $i -lt $count.CounterSamples.Count; $i++)

{

$v = $count.CounterSamples.Get($i).CookedValue

$pfc = $pfcounters[$i]

#$pfc.get_InnerText()

$b = ""

$lg = ""

if($pfc.operator -eq "lt")

{

if ($v -ge [double]$pfc.alter)

{$b = "alter"

$lg = "Greater Than"}

}

elseif ($pfc.operator -eq "gt")

{

if( $v -le [double]$pfc.alter)

{$b = "alter"

$lg = "Less Than"}

}

if($b -eq "alter")

{

$path = "\"+$cp+$pfc.get_InnerText()

$item = "{0}:{1};{2} Threshold:{3}" -f $path,$v.ToString(),$lg,$pfc.alter.Trim()

$report += $item + "`n"

}

}

if($report -ne "")

{

#生产警告 参数 计数器,阀值,当前值

CreateAlter $report

}

}

其中涉及到2个配置文件:computernamexml,alter_cpuxml分别如下:

<computernames>

<computername>

wuxuelei-pc

</computername>

</computernames>

<Counters>

<Counter alter = "10" operator = "gt" >Processor(_Total)% Processor Time</Counter>

<Counter alter = "10" operator = "gt" >Processor(_Total)% Privileged Time</Counter>

<Counter alter = "10" operator = "gt" >SQLServer:SQL StatisticsBatch Requests/sec</Counter>

<Counter alter = "10" operator = "gt" >SQLServer:SQL StatisticsSQL Compilations/sec</Counter>

<Counter alter = "10" operator = "gt" >SQLServer:SQL StatisticsSQL Re-Compilations/sec</Counter>

<Counter alter = "10" operator= "lt" >SystemProcessor Queue Length</Counter>

<Counter alter = "10" operator= "lt" >SystemContext Switches/sec</Counter>

</Counters>

其中 alter 就是阀值,如第一条,如果 阀值 > 性能计数器值,就会发出警告。

其实这种自定义配置的方式,实现了灵活多变的自动化监控标准:

1、比如可以检测磁盘空间大小

2、检测运行峰值状态

3、定时的根据历史运行值,更改生产系统中的阀值大小,也就是所谓的运行基线

警告实现方式

1、SQL Agent配置Job方式实现

2、计划任务

以上两种配置方式,可以灵活掌握,操作还是蛮简单的,如果不会,可自行google。当然,如果不想干预正常的生产系统,可以添加一个Server专门用来自动化运维检测来用,实现远程监控。

后续文章中会分析关于Power Shell的远程调用,并且能实现事故当前状态下,自动化截图….自动Send Email……为DBA现场取证第一手材料…方便诊断问题…

效果图如下

名单

以上只提供实现方式,如需要内容更新,自己灵活更新。

更多信息请查看IT技术专栏

推荐信息