Overview
In the previous post we covered Cache and will now look at monitoring the cpu performance of the 3PAR Array. Metrics are gathered using the statcpu command.
- Cache
- statcmp
- Cpu
- statcpu
- Hosts
- statvlun / statvlun -hostsum
- Ports
- statport -host
- Volumes
- statvlun -vvsum
Getting started..
How do I run the statcpu command? Below shows how to setup a passfile so we dont need to input a password for our commands.
# PATH=$PATH:/opt/hp_3par_cli/bin/ # setpassword -saveonly -file mypass.your3par system: your3par user: password:
CPU
CPU performance metrics are well known across many system types and the 3PAR is no different we are given access to the Node, CPU, User Time, System Time, Idle and an overall Interrupts and Context Switches per node.
Statcpu command output
Below we have some example output which has been truncated to a single node.
EXAMPLES The following example displays an iteration of CPU statistics for a single node: %cli statcpu -iter 1 -d 15 21:00:00 11/21/2017 node,cpu user sys idle intr/s ctxt/s 0,0 0 90 10 0,1 0 54 46 0,2 0 58 42 0,3 0 49 51 0,4 0 52 48 0,5 0 44 56 0,6 0 81 19 0,7 0 42 58 0,8 0 45 55 0,9 0 51 49 0,10 0 38 62 0,11 0 36 64 0,12 0 36 64 0,13 0 66 34 0,14 0 31 69 0,15 0 88 12 0,16 0 33 67 0,17 0 31 69 0,18 0 30 70 0,19 0 30 70 0,20 0 39 61 0,21 0 31 69 0,22 0 63 37 0,23 0 32 68 0,24 0 86 14 0,25 0 28 72 0,26 0 28 72 0,27 0 44 56 0,28 0 60 40 0,29 0 38 62 0,30 0 34 66 0,31 0 74 26 0,total 0 48 52 288010 298043
Great so lets explain the output.
- node,cpu
- The Node in the Array and each CPU available
- user
- % of time spent in User processing
- sys
- % of time spent in System processing (kernel/syscalls)
- intr/s
- Number of interrupts/second during the sample period
- ctxt/s
- Number of Context Switches/sec during the sample period
- total
- Overall average of CPU usage in each column for the sample period.
Looking at the overall node you can get a feel for how busy the CPU is by using the total field and tracking this if you start to see an increase in system time we would need to look at what the array is doing, this could be garbage collection task etc.
We also have per CPU metrics which is also great so we can look at each CPU and also check if we have some CPU’s running hot if we experience issues. In the above example CPU 0 has been running at 90% in system processing over the last 15 seconds.
** What to track %sys > 60% of a Node ** Also look for Balance of metrics if we have a node more busy than others we may have a multi-path issue.
The Parsing Script
BEGIN { printf("%8s %4s %4s %4s %4s %4s %12s %12s \n","Time","Node","Cpu","User","Sys","Util","intr/s","ctxt/s"); } { if ($1 ~ /[0-9][0-9]:/) {time=$1} if ($1 ~ /total/) { split($1,ncpu,",") printf("%8s %4d %4d %4d %4d %4d %12d %12d\n",time,ncpu[1],ncpu[2],$2,$3,100-$4,$5,$6); } }
We can also use this script with a file by doing the following.
# awk -f ./awk_cpu statpu.out |more Time Node Cpu User Sys Util intr/s ctxt/s 00:00:27 0 0 0 48 48 288010 298043 00:00:27 1 0 1 48 49 288948 297952 00:00:27 2 0 8 49 57 304743 295164 00:00:27 3 0 4 51 55 304167 313328 00:00:42 0 0 1 52 53 287327 280944 00:00:42 1 0 1 51 52 286144 270951 00:00:42 2 0 8 52 59 301698 280315 00:00:42 3 0 6 52 58 307004 288337 00:00:58 0 0 1 48 49 296359 306778
Or in realtime
# statcpu -pwf ./mypass.mypar -sys testpar -iter 5 -d 1 |awk -f ./awk_cpu Time Node Cpu User Sys Util intr/s ctxt/s 22:20:14 0 0 0 11 11 133028 149000 22:20:14 1 0 0 12 12 150715 164091 22:20:14 2 0 7 10 17 145678 167100 22:20:14 3 0 0 12 12 149124 165820 22:20:15 0 0 0 11 11 145646 160334 22:20:15 1 0 0 15 16 161879 189288 22:20:15 2 0 6 12 19 148670 177994 22:20:15 3 0 0 16 16 152063 169221 22:20:16 0 0 0 14 14 158968 174704 22:20:16 1 0 0 15 16 165782 206909 22:20:16 2 0 6 14 21 167643 206980 22:20:16 3 0 0 13 13 145107 172812
We mentioned above that we can look at each CPU on the system and see what that looks like. I will cover this topic off when we look to build the main view for Grafana for the overall Array view. However here is an example of why we want to gather this data.
Looking above at Node 3 (Orange) and Node 1(Yellow) we have a spike in %sys at 2pm lets take a look to see if we can see whats causing the spike.
Here we have selected just the ports on Node 3 and we can see we have a port which is busy servicing write IOPS at the time of the %sys spike.
In next post I will cover the metrics for Hosts and Volumes. I will show two ways of gathering the metrics and the pros and cons of each.
2 thoughts on “3PAR Performance Monitoring – CPU”