I have been looking a lot at ESX performance and today I was looking at network stats. I can get network data in two second samples from esxtop however parsing of the data is cumbersome.
I could use powershell however in this method I need to do a delta from two sets of data where this again pushes the sample time out to many seconds. What I would like is to run a tool similar to sar -n DEV 1 below we can see an ouput from a standard RedHat host.
21:29:41 IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s 21:29:42 bond0 8148.98 2310.20 11525.37 1933.02 0.00 0.00 28.57 21:29:42 eth0 8130.61 2310.20 11524.02 1933.02 0.00 0.00 14.29 21:29:42 eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 21:29:42 eth2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 21:29:42 eth3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 21:29:42 eth4 18.37 0.00 1.36 0.00 0.00 0.00 14.29 21:29:42 eth5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 21:29:42 eth6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 21:29:42 eth7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 21:29:42 lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00
As a starting point what I have is as follows from the ESX command prompt.
TSTAMP IFACE rxpck/s txpck/s rxMB/s txMB/s rxsize txsize rxeps txeps 10:14:46 AM vmnic0 13 1 0.00 0.00 112 399 0 0 10:14:46 AM vmnic4 9 0 0.00 0.00 86 0 0 0 10:14:46 AM vmnic2 5 0 0.00 0.00 135 0 0 0 10:14:46 AM vmnic6 4 0 0.00 0.00 94 0 0 0 10:14:46 AM vmnic1 3 0 0.00 0.00 163 0 0 0 10:14:46 AM vmnic5 2 0 0.00 0.00 94 0 0 0
With some data being copied between two VMs on the same node over two different vlans.
TSTAMP IFACE rxpck/s txpck/s rxMB/s txMB/s rxsize txsize rxeps txeps 08:52:23 PM vmnic0 11 5 0 0 78 233 0 0 08:52:23 PM vmnic4 8 0 0 0 85 0 0 0 08:52:23 PM vmnic2 23801 2572 34 0 1449 70 0 0 08:52:23 PM vmnic6 2577 1784 0 33 70 18516 0 0 08:52:23 PM vmnic1 2 0 0 0 94 0 0 0 08:52:23 PM vmnic5 2 0 0 0 94 0 0 0 08:52:24 PM vmnic0 11 6 0 0 78 214 0 0 08:52:24 PM vmnic4 8 0 0 0 85 0 0 0 08:52:24 PM vmnic2 44308 4182 64 0 1456 71 0 0 08:52:24 PM vmnic6 4189 3005 0 61 71 20583 0 0 08:52:24 PM vmnic1 2 0 0 0 94 0 0 0 08:52:24 PM vmnic5 2 0 0 0 94 0 0 0 08:52:25 PM vmnic0 12 6 0 0 79 209 0 0 08:52:25 PM vmnic4 8 0 0 0 85 0 0 0 08:52:25 PM vmnic2 46642 3814 68 0 1465 71 0 0 08:52:25 PM vmnic6 3820 2810 0 65 71 23286 0 0 08:52:25 PM vmnic1 2 0 0 0 94 0 0 0 08:52:25 PM vmnic5 2 0 0 0 94 0 0 0 08:52:26 PM vmnic0 12 15 0 0 92 227 0 0 08:52:26 PM vmnic4 9 0 0 0 103 0 0 0 08:52:26 PM vmnic2 42063 4161 61 0 1450 70 0 0 08:52:26 PM vmnic6 4170 2994 0 58 71 19515 0 0 08:52:26 PM vmnic1 2 0 0 0 94 0 0 0 08:52:26 PM vmnic5 2 0 0 0 94 0 0 0 08:52:27 PM vmnic0 14 9 0 0 126 239 0 0 08:52:27 PM vmnic4 8 0 0 0 85 0 0 0 08:52:27 PM vmnic2 47247 3996 69 0 1460 74 0 0 08:52:27 PM vmnic6 4003 3028 0 66 74 21831 0 0 08:52:27 PM vmnic1 3 0 0 0 94 0 0 0 08:52:27 PM vmnic5 3 0 0 0 94 0 0 0 08:52:28 PM vmnic0 10 61 0 0 80 238 0 0 08:52:28 PM vmnic4 8 0 0 0 85 0 0 0 08:52:28 PM vmnic2 39979 3471 58 0 1452 71 0 0 08:52:28 PM vmnic6 3477 2784 0 55 71 19979 0 0 08:52:28 PM vmnic1 2 0 0 0 94 0 0 0 08:52:28 PM vmnic5 2 0 0 0 94 0 0 0
EDIT– I have updated the script to be floating point as this matches more what sar would show.
Or a test where we migrate a VM to this server.
TSTAMP IFACE rxpck/s txpck/s rxMB/s txMB/s rxsize txsize rxeps txeps 10:15:58 AM vmnic0 13 34 0.00 0.01 155 268 0 0 10:15:58 AM vmnic4 8 0 0.00 0.00 85 0 0 0 10:15:58 AM vmnic2 5 0 0.00 0.00 87 0 0 0 10:15:58 AM vmnic6 5 0 0.00 0.00 87 0 0 0 10:15:58 AM vmnic1 81109 9967 121.64 0.66 1499 66 0 0 10:15:58 AM vmnic5 80496 9540 120.80 0.62 1500 66 0 0 10:15:59 AM vmnic0 16 64 0.00 0.01 201 268 0 0 10:15:59 AM vmnic4 9 0 0.00 0.00 82 0 0 0 10:15:59 AM vmnic2 4 0 0.00 0.00 94 0 0 0 10:15:59 AM vmnic6 4 0 0.00 0.00 94 0 0 0 10:15:59 AM vmnic1 81842 9792 122.60 0.65 1498 66 0 0 10:15:59 AM vmnic5 81698 9849 122.41 0.65 1498 66 0 0
We can see in the first test was that we have data being copied up to 71 MB/sec between interfaces. And in our second test we can see a short sample of the data showing we are at the maximum BW for two vmnics @1G.
I thought I would also show the performance charts from the Vsphere client
We can see we have one spike in the chart highlighted in red as this is a 20 second average we can see we have hit peak BW. However we cant tell from this data how many times we were at peak utilisation.
In the above image we can see we are running at max utilisation for the nic’s over the sample period. So where we would have thought that we may have peaked at our max util we can see we were actually running at it for 30 seconds.
So how do we get the data? Well I use the command net-stats where I parse the output and process via awk.
net-stats -A -tc -i 1 -n 300 | sed 's/["|":|:|/|,]//g' |awk -f ./nstat.awk
BEGIN { printf ("%-12s %-8s %-8s %-8s %-8s %-8s %-8s %-8s %-8s %-8s\n","TSTAMP","IFACE","rxpck\/s","txpck\/s","rxMB\/s","txMB\/s","rxsize","txsize","rxeps","txeps");} { if ( $0 ~ /time / ) { ts=$2;} if ( $0 ~ /vmnic/ ) { vmnic=$2; found=1; } if (( $0 ~ /txpps/ ) && ( found == 1)) { printf("%-12s %-8s %-8d %-8d %-8.2f %-8.2f %-8d %-8d %-8d %-8d \n",strftime("%r",ts),vmnic,$10,$2,$12/8,$4/8,$14,$6,$NF,$8); found=0; } }
Enjoy…