In the previous post I showed how we can work out the QOS values from the average I/O size for a volume. In this post I will show a tool which was created to help monitor if a volume is at its QOS limit.
As a quick reminder from the previous post if we look at the following table for my test volume I have a QOS setup of 3000,20000 & 30000 IOPS.
IOSIZE COST MINIOPS MAXIOPS BIOPS MINMB MAXMB BURSTMB 4 100 3000 20000 30000 12 78 117 8 160 1875 12500 18750 15 98 146 16 270 1111 7407 11111 17 116 174 32 500 600 4000 6000 19 125 188 64 1000 300 2000 3000 19 125 188 128 1950 154 1026 1538 19 128 192
By reading the above if I was to issue 8Kb IOPS then I should be able to push 146MB/sec in burst mode and then when we have no credits left it should be able to sustain 98MB/sec.
If we look at the output of the tool which is named vtrace we should see the current statistics for the volume and what we believe the QOS BW values are for the average I/O size.
The tool is run was follows
./vtrace ARRAY VOLUME_ID INTERVAL | [COUNT] (solidfire) bash-4.2$ ./vtrace $ARRAY 28 1 Tracing... Output every 1 secs. Hit Ctrl-C to end TIME SET MIN/MAX/BST CUR MIN/MAX/BST |IOSIZE(kb) IOPS IOPSQOS CURBW(MB) BW_QOS(MB) BURST_QOS(MB) VUTIL% BURSTCRED RLT(us) WLT(us) LAT(us) QDEPTH IOQOS BWQOS 20:47:34 3000/20000/30000 779/5194/7792 | 24 100 5194 2 122 183 2.50 600000 607 589 607 0 N N 20:47:35 3000/20000/30000 600/4000/6000 | 32 1290 4000 40 125 188 32.25 600000 550 0 550 1 N N 20:47:37 3000/20000/30000 617/4118/6177 | 31 1410 4118 43 125 187 33.57 600000 547 506 544 0 N N 20:47:38 3000/20000/30000 600/4000/6000 | 32 952 4000 30 125 188 23.80 600000 819 0 819 1 N N 20:47:39 3000/20000/30000 600/4000/6000 | 32 1170 4000 37 125 188 29.25 600000 620 516 619 0 N N 20:47:40 3000/20000/30000 600/4000/6000 | 32 1310 4000 41 125 188 32.75 600000 535 577 536 1 N N 20:47:42 3000/20000/30000 600/4000/6000 | 32 1314 4000 41 125 188 32.85 600000 538 581 538 1 N N 20:47:43 3000/20000/30000 600/4000/6000 | 32 1328 4000 42 125 188 33.20 600000 530 581 531 1 N N 20:47:44 3000/20000/30000 600/4000/6000 | 32 1324 4000 41 125 188 33.10 600000 524 0 524 1 N N 20:47:45 3000/20000/30000 600/4000/6000 | 32 1320 4000 41 125 188 33.00 600000 531 494 531 0 N N 20:47:47 3000/20000/30000 600/4000/6000 | 32 0 4000 0 125 188 0.00 600000 0 0 0 0 N N 20:47:48 3000/20000/30000 1600/10666/16000 | 10 2 10666 0 104 156 0.01 600000 0 595 595 0 N N 20:47:49 3000/20000/30000 3000/20000/30000 | 2 2 20000 0 39 59 0.01 600000 0 565 565 0 N N ^C20:47:50 3000/20000/30000 3000/20000/30000 | 2 0 20000 0 39 59 0.00 600000 0 0 0 0 N N Detaching...
What we can see above is that is that we have an average I/O size of 32Kb with a QOS value set as 3000(min) 20000(max) and 30000(burst). Running this through our table tool we get the following.
(solidfire) bash-4.2$ ./sf_table.py 3000 20000 30000 |egrep "IOPS|^32 " IOSIZE COST MINIOPS MAXIOPS BIOPS MINMB MAXMB BURSTMB 32 500 600 4000 6000 19 125 188
So our maximum BW would be 125MB/sec based on a 32Kb I/O size and where we would be able to burst we could achieve 188MB/sec with a maximum of 6000 IOPS.
Some people get confused that they believe they can always get 20K max iops but that’s based on a 4Kb I/O size.
Lets look at the headers from the tool and explain whats going on.
- TIME – Timestamp.
- SET MIN/MAX/BST – The volumes QOS settings.
- CUR MIN/MAX/BST – What we really get for the current I/O size.
- IOSIZE(kb) – Average I/O size for the volume.
- IOPS – Current Iops to the volume.
- IOPSQOS – The Max Iops we can get at the average I/O size.
- CURBW(MB) – Current BW to the volume.
- BW_QOS(MB)- BW QOS at Max Iops for the volume.
- BURST_QOS(MB) – What throughput can we burst up too.
- VUTIL – Volume utilisation.
- BURSTCRED – Amount of credits we have to allow burst.
- RLT(us)/WLT(us)/LAT(us) – Read/Write and average latency for the volume.
- QDEPTH – Current queue depth for the volume.
- IOQOS – IOPS are >= MAX IOPS.
- BWQOS – BW > = MAX BW or BURST BW.
Ok so lets run a workload through the volume and see what happens when we hit QOS.
(solidfire) bash-4.2$ ./vtrace $ARRAY 28 1 Tracing... Output every 1 secs. Hit Ctrl-C to end TIME SET MIN/MAX/BST CUR MIN/MAX/BST |IOSIZE(kb) IOPS IOPSQOS CURBW(MB) BW_QOS(MB) BURST_QOS(MB) VUTIL% BURSTCRED RLT(us) WLT(us) LAT(us) QDEPTH IOQOS BWQOS 21:00:05 3000/20000/30000 3000/20000/30000 | 1 0 20000 0 20 29 0.00 600000 0 0 0 0 N N 21:00:06 3000/20000/30000 3000/20000/30000 | 2 4 20000 0 39 59 0.02 600000 0 569 569 0 N N 21:00:07 3000/20000/30000 1600/10666/16000 | 10 14 10666 0 104 156 0.16 600000 0 547 547 0 N N 21:00:08 3000/20000/30000 3000/20000/30000 | 2 2 20000 0 39 59 0.01 600000 0 605 605 0 N N 21:00:10 3000/20000/30000 636/4244/6366 | 30 6000 4244 176 124 187 150.00 590060 663 552 663 17 Y Y 21:00:11 3000/20000/30000 600/4000/6000 | 32 6004 4000 188 125 188 150.10 575213 1231 682 1230 16 Y Y 21:00:12 3000/20000/30000 600/4000/6000 | 32 6002 4000 188 125 188 150.05 565269 775 1055 776 15 Y Y 21:00:14 3000/20000/30000 600/4000/6000 | 32 6002 4000 188 125 188 150.05 550431 661 1240 661 16 Y Y 21:00:15 3000/20000/30000 600/4000/6000 | 32 6000 4000 188 125 188 150.00 540431 779 0 779 16 Y Y 21:00:16 3000/20000/30000 600/4000/6000 | 32 6002 4000 188 125 188 150.05 525594 655 1355 655 17 Y Y 21:00:17 3000/20000/30000 600/4000/6000 | 32 6000 4000 188 125 188 150.00 515654 698 0 698 16 Y Y . . . 21:01:10 3000/20000/30000 600/4000/6000 | 32 5000 4000 156 125 188 125.00 2501 611 0 611 17 Y Y 21:01:11 3000/20000/30000 600/4000/6000 | 32 4142 4000 129 125 188 103.55 372 625 825 626 17 Y Y 21:01:12 3000/20000/30000 600/4000/6000 | 32 4046 4000 126 125 188 101.15 144 605 1074 605 16 Y Y 21:01:13 3000/20000/30000 600/4000/6000 | 32 4018 4000 126 125 188 100.45 71 594 1000 594 18 Y Y 21:01:15 3000/20000/30000 600/4000/6000 | 32 4018 4000 126 125 188 100.45 48 683 0 683 17 Y Y 21:01:16 3000/20000/30000 600/4000/6000 | 32 4016 4000 126 125 188 100.40 66 655 902 655 17 Y Y 21:01:17 3000/20000/30000 600/4000/6000 | 32 4012 4000 125 125 188 100.30 37 602 0 602 15 Y Y 21:01:18 3000/20000/30000 600/4000/6000 | 32 4010 4000 125 125 188 100.25 54 566 724 566 17 Y Y 21:01:20 3000/20000/30000 600/4000/6000 | 32 4018 4000 126 125 188 100.45 61 566 696 567 16 Y Y 21:01:21 3000/20000/30000 600/4000/6000 | 32 4012 4000 125 125 188 100.30 63 598 1107 598 16 Y Y 21:01:22 3000/20000/30000 600/4000/6000 | 32 4008 4000 125 125 188 100.20 54 656 680 656 16 Y Y 21:01:23 3000/20000/30000 600/4000/6000 | 32 4010 4000 125 125 188 100.25 46 600 639 600 17 Y Y 21:01:25 3000/20000/30000 600/4000/6000 | 32 4018 4000 126 125 188 100.45 58 595 459 595 17 Y Y 21:01:26 3000/20000/30000 600/4000/6000 | 32 4012 4000 125 125 188 100.30 35 585 0 585 15 Y Y 21:01:27 3000/20000/30000 600/4000/6000 | 32 4004 4000 125 125 188 100.10 46 597 890 597 16 Y Y 21:01:28 3000/20000/30000 600/4000/6000 | 32 4018 4000 126 125 188 100.45 64 619 902 619 16 Y Y 21:01:30 3000/20000/30000 600/4000/6000 | 32 4008 4000 125 125 188 100.20 55 609 1030 610 17 Y Y 21:01:31 3000/20000/30000 600/4000/6000 | 32 4008 4000 125 125 188 100.20 52 604 762 605 17 Y Y 21:01:32 3000/20000/30000 600/4000/6000 | 32 2676 4000 84 125 188 66.90 3390 612 699 612 0 N N 21:01:34 3000/20000/30000 3000/20000/30000 | 4 2 20000 0 78 117 0.01 33384 0 558 558 0 N N 21:01:35 3000/20000/30000 3000/20000/30000 | 2 2 20000 0 39 59 0.01 53378 0 555 555 0 N N Detaching...
What we can see ois at 21:00:10 we hit QOS we can see that the IOPS are equal to that of the burst IOPS figure(CUR BST) so this tells us we are in burst mode. We can also see that the CURBW(MB) being used is >= to that of the BURST_QOS(MB).
Note also that the burst credits(BURSTCRED) had started to drop as we were in burst mode and at 21:01:10 we start to drop back down to the IOPSQOS figure which is our Max IOPS for the I/O size. We also see that when the workload completes at 21:01:32 we start to accrue credits again which will allow us to burst in the future.
Things to look out for
Volume utilisation
If the volume utilisation > 100% then we are in Burst and in general a QOS situation. If the utilisation is >=100% we are in a hard QOS i.e. at our maximum. If we run in this mode for a long period we will not accrue credits unless our IOPS < MAX IOPS as such we can never enter burst mode again until the workload drops off.
I have noticed that the ActiveIQ product from NetApp does not show the historical volume utilisation but the online monitoring does ( point your browser at the array management ip), at my place of work we have written our own tooling to monitor array performance using the SolidFire python API.
If you want to use the vtrace or sf_table tools these are available here:SolidFire scripts.