SolidFire QOS Part 3 – Monitoring

In the previous post I showed how we can work out the QOS values from the average I/O size for a volume. In this post I will show a tool which was created to help monitor if a volume is at its QOS limit.

As a quick reminder from the previous post if we look at the following table for my test volume I have a QOS setup of 3000,20000 & 30000 IOPS.

IOSIZE   COST     MINIOPS  MAXIOPS  BIOPS    MINMB    MAXMB    BURSTMB
4        100      3000     20000    30000    12       78       117
8        160      1875     12500    18750    15       98       146
16       270      1111     7407     11111    17       116      174
32       500      600      4000     6000     19       125      188
64       1000     300      2000     3000     19       125      188
128      1950     154      1026     1538     19       128      192

By reading the above if I was to issue 8Kb IOPS then I should be able to push 146MB/sec in burst mode and then when we have no credits left it should be able to sustain 98MB/sec.
If we look at the output of the tool which is named vtrace we should see the current statistics for the volume and what we believe the QOS BW values are for the average I/O size.

The tool is run was follows

./vtrace ARRAY VOLUME_ID INTERVAL | [COUNT] 

(solidfire) bash-4.2$ ./vtrace $ARRAY 28 1
Tracing... Output every 1 secs. Hit Ctrl-C to end
TIME         SET MIN/MAX/BST  CUR MIN/MAX/BST           |IOSIZE(kb) IOPS     IOPSQOS    CURBW(MB)  BW_QOS(MB)   BURST_QOS(MB) VUTIL% BURSTCRED  RLT(us)  WLT(us)  LAT(us)  QDEPTH   IOQOS BWQOS
20:47:34     3000/20000/30000 779/5194/7792             | 24         100      5194       2          122          183           2.50   600000     607      589      607      0        N     N
20:47:35     3000/20000/30000 600/4000/6000             | 32         1290     4000       40         125          188           32.25  600000     550      0        550      1        N     N
20:47:37     3000/20000/30000 617/4118/6177             | 31         1410     4118       43         125          187           33.57  600000     547      506      544      0        N     N
20:47:38     3000/20000/30000 600/4000/6000             | 32         952      4000       30         125          188           23.80  600000     819      0        819      1        N     N
20:47:39     3000/20000/30000 600/4000/6000             | 32         1170     4000       37         125          188           29.25  600000     620      516      619      0        N     N
20:47:40     3000/20000/30000 600/4000/6000             | 32         1310     4000       41         125          188           32.75  600000     535      577      536      1        N     N
20:47:42     3000/20000/30000 600/4000/6000             | 32         1314     4000       41         125          188           32.85  600000     538      581      538      1        N     N
20:47:43     3000/20000/30000 600/4000/6000             | 32         1328     4000       42         125          188           33.20  600000     530      581      531      1        N     N
20:47:44     3000/20000/30000 600/4000/6000             | 32         1324     4000       41         125          188           33.10  600000     524      0        524      1        N     N
20:47:45     3000/20000/30000 600/4000/6000             | 32         1320     4000       41         125          188           33.00  600000     531      494      531      0        N     N
20:47:47     3000/20000/30000 600/4000/6000             | 32         0        4000       0          125          188           0.00   600000     0        0        0        0        N     N
20:47:48     3000/20000/30000 1600/10666/16000          | 10         2        10666      0          104          156           0.01   600000     0        595      595      0        N     N
20:47:49     3000/20000/30000 3000/20000/30000          | 2          2        20000      0          39           59            0.01   600000     0        565      565      0        N     N
^C20:47:50     3000/20000/30000 3000/20000/30000        | 2          0        20000      0          39           59            0.00   600000     0        0        0        0        N     N
Detaching...

What we can see above is that is that we have an average I/O size of 32Kb with a QOS value set as 3000(min) 20000(max) and 30000(burst). Running this through our table tool we get the following.

(solidfire) bash-4.2$ ./sf_table.py 3000 20000 30000 |egrep "IOPS|^32 "
IOSIZE   COST     MINIOPS  MAXIOPS  BIOPS    MINMB    MAXMB    BURSTMB
32       500      600      4000     6000     19       125      188

So our maximum BW would be 125MB/sec based on a 32Kb I/O size and where we would be able to burst we could achieve 188MB/sec with a maximum of 6000 IOPS.

Some people get confused that they believe they can always get 20K max iops but that’s based on a 4Kb I/O size.

Lets look at the headers from the tool and explain whats going on.

  • TIME – Timestamp.
  • SET MIN/MAX/BST – The volumes QOS settings.
  • CUR MIN/MAX/BST – What we really get for the current I/O size.
  • IOSIZE(kb) – Average I/O size for the volume.
  • IOPS – Current Iops to the volume.
  • IOPSQOS – The Max Iops we can get at the average I/O size.
  • CURBW(MB) – Current BW to the volume.
  • BW_QOS(MB)- BW QOS at Max Iops for the volume.
  • BURST_QOS(MB) – What throughput can we burst up too.
  • VUTIL – Volume utilisation.
  • BURSTCRED – Amount of credits we have to allow burst.
  • RLT(us)/WLT(us)/LAT(us) – Read/Write and average latency for the volume.
  • QDEPTH – Current queue depth for the volume.
  • IOQOS – IOPS are >= MAX IOPS.
  • BWQOS – BW > = MAX BW or BURST BW.

Ok so lets run a workload through the volume and see what happens when we hit QOS.

(solidfire) bash-4.2$ ./vtrace $ARRAY 28 1
Tracing... Output every 1 secs. Hit Ctrl-C to end
TIME         SET MIN/MAX/BST  CUR MIN/MAX/BST           |IOSIZE(kb) IOPS     IOPSQOS    CURBW(MB)  BW_QOS(MB)   BURST_QOS(MB) VUTIL% BURSTCRED  RLT(us)  WLT(us)  LAT(us)  QDEPTH   IOQOS BWQOS
21:00:05     3000/20000/30000 3000/20000/30000          | 1          0        20000      0          20           29            0.00   600000     0        0        0        0        N     N
21:00:06     3000/20000/30000 3000/20000/30000          | 2          4        20000      0          39           59            0.02   600000     0        569      569      0        N     N
21:00:07     3000/20000/30000 1600/10666/16000          | 10         14       10666      0          104          156           0.16   600000     0        547      547      0        N     N
21:00:08     3000/20000/30000 3000/20000/30000          | 2          2        20000      0          39           59            0.01   600000     0        605      605      0        N     N
21:00:10     3000/20000/30000 636/4244/6366             | 30         6000     4244       176        124          187           150.00 590060     663      552      663      17       Y     Y
21:00:11     3000/20000/30000 600/4000/6000             | 32         6004     4000       188        125          188           150.10 575213     1231     682      1230     16       Y     Y
21:00:12     3000/20000/30000 600/4000/6000             | 32         6002     4000       188        125          188           150.05 565269     775      1055     776      15       Y     Y
21:00:14     3000/20000/30000 600/4000/6000             | 32         6002     4000       188        125          188           150.05 550431     661      1240     661      16       Y     Y
21:00:15     3000/20000/30000 600/4000/6000             | 32         6000     4000       188        125          188           150.00 540431     779      0        779      16       Y     Y
21:00:16     3000/20000/30000 600/4000/6000             | 32         6002     4000       188        125          188           150.05 525594     655      1355     655      17       Y     Y
21:00:17     3000/20000/30000 600/4000/6000             | 32         6000     4000       188        125          188           150.00 515654     698      0        698      16       Y     Y
.
.
.
21:01:10     3000/20000/30000 600/4000/6000             | 32         5000     4000       156        125          188           125.00 2501       611      0        611      17       Y     Y
21:01:11     3000/20000/30000 600/4000/6000             | 32         4142     4000       129        125          188           103.55 372        625      825      626      17       Y     Y
21:01:12     3000/20000/30000 600/4000/6000             | 32         4046     4000       126        125          188           101.15 144        605      1074     605      16       Y     Y
21:01:13     3000/20000/30000 600/4000/6000             | 32         4018     4000       126        125          188           100.45 71         594      1000     594      18       Y     Y
21:01:15     3000/20000/30000 600/4000/6000             | 32         4018     4000       126        125          188           100.45 48         683      0        683      17       Y     Y
21:01:16     3000/20000/30000 600/4000/6000             | 32         4016     4000       126        125          188           100.40 66         655      902      655      17       Y     Y
21:01:17     3000/20000/30000 600/4000/6000             | 32         4012     4000       125        125          188           100.30 37         602      0        602      15       Y     Y
21:01:18     3000/20000/30000 600/4000/6000             | 32         4010     4000       125        125          188           100.25 54         566      724      566      17       Y     Y
21:01:20     3000/20000/30000 600/4000/6000             | 32         4018     4000       126        125          188           100.45 61         566      696      567      16       Y     Y
21:01:21     3000/20000/30000 600/4000/6000             | 32         4012     4000       125        125          188           100.30 63         598      1107     598      16       Y     Y
21:01:22     3000/20000/30000 600/4000/6000             | 32         4008     4000       125        125          188           100.20 54         656      680      656      16       Y     Y
21:01:23     3000/20000/30000 600/4000/6000             | 32         4010     4000       125        125          188           100.25 46         600      639      600      17       Y     Y
21:01:25     3000/20000/30000 600/4000/6000             | 32         4018     4000       126        125          188           100.45 58         595      459      595      17       Y     Y
21:01:26     3000/20000/30000 600/4000/6000             | 32         4012     4000       125        125          188           100.30 35         585      0        585      15       Y     Y
21:01:27     3000/20000/30000 600/4000/6000             | 32         4004     4000       125        125          188           100.10 46         597      890      597      16       Y     Y
21:01:28     3000/20000/30000 600/4000/6000             | 32         4018     4000       126        125          188           100.45 64         619      902      619      16       Y     Y
21:01:30     3000/20000/30000 600/4000/6000             | 32         4008     4000       125        125          188           100.20 55         609      1030     610      17       Y     Y
21:01:31     3000/20000/30000 600/4000/6000             | 32         4008     4000       125        125          188           100.20 52         604      762      605      17       Y     Y
21:01:32     3000/20000/30000 600/4000/6000             | 32         2676     4000       84         125          188           66.90  3390       612      699      612      0        N     N
21:01:34     3000/20000/30000 3000/20000/30000          | 4          2        20000      0          78           117           0.01   33384      0        558      558      0        N     N
21:01:35     3000/20000/30000 3000/20000/30000          | 2          2        20000      0          39           59            0.01   53378      0        555      555      0        N     N

Detaching...

What we can see ois at 21:00:10 we hit QOS we can see that the IOPS are equal to that of the burst IOPS figure(CUR BST) so this tells us we are in burst mode. We can also see that the CURBW(MB) being used is >= to that of the BURST_QOS(MB).

Note also that the burst credits(BURSTCRED) had started to drop as we were in burst mode and at 21:01:10 we start to drop back down to the IOPSQOS figure which is our Max IOPS for the I/O size. We also see that when the workload completes at 21:01:32 we start to accrue credits again which will allow us to burst in the future.

Things to look out for

Volume utilisation
If the volume utilisation > 100% then we are in Burst and in general a QOS situation. If the utilisation is >=100% we are in a hard QOS i.e. at our maximum. If we run in this mode for a long period we will not accrue credits unless our IOPS < MAX IOPS as such we can never enter burst mode again until the workload drops off.

I have noticed that the ActiveIQ product from NetApp does not show the historical volume utilisation but the online monitoring does ( point your browser at the array management ip), at my place of work we have written our own tooling to monitor array performance using the SolidFire python API.

If you want to use the vtrace or sf_table tools these are available here:SolidFire scripts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s