Skip to content

Linux Troubleshooting tool – VMStat

Today we’re here to talk about vmstat which is performance tool for Unix/Linux operating system. the version in this article is for CentOS-based.

vmstate is used by system administrators and anyone who is interested in the overall health of the system. it’s a coarse view. To run it, you run vmstat in the command line, you don’t have to login with root or system admin. it has two optional arguments: interval and count.

image

if you run command in your machine, you’ll see the first line of output is printed really quickly and it hasn’t waited for the one-second interval. the result are ragged and they don’t quite line up sometimes so you do need to count or use your eyeballs sometimes.

what I’d like to do is I’d like to talk about the key columns you read to understand CPU, memory usage and then what I’ll do is I’ll go through all the columns one by one and explain them.

From the picture, you can see there are six columns. we are going to start from the CPU part. we’ve got ‘us’, ‘sy’ , ‘id’, ‘wa’, and ‘st’. These three(us, sys, id) columns should add to one hundred percent and they are percent average for the entire system for that interval across all CPUs.

image 2

34% in user time, 10% in system time and 56% for idle. that’s the average across all CPUs for the interval which was one second.

us: it is called user time, which means time in your code or time running process code or applications like Apache, MySQL, things you write in PHP, Python, Java and so on.

sy: system time, is the percent of CPU time running inside the kernel or the system that includes system calls, asynchronous kernel threads that do various housekeeping tasks and time spent servicing interrupts.

id: idle, means the percent of the time that CPUs weren’t doing work. in this example, I’ve got 56% for that interval in idle which means I’ve got plenty of headroom. although I do need to bear in mind that this is a very coarse way of understanding CPU usage because I can’t see what’s happening per CPU. I may actually have a CPU problem where one CPU is hot and I have a CPU load not balanced I need to use other tools like MPstat to see how the CPUs are balanced per CPU.

The first r column could be seen as a part of the CPU. the number of threads in the ready to run the state on the CPU dispatch queues. I’ve drawn a small picture to explain.

CPU can be running threads and they’ll have threads that are backup up waiting for their turn on the CPU. these are called CPU dispatch queues. I’ve drawn three threads here queued up waiting for their turn on the CPU. so if I was to take a snapshot right now I would say the number of the threads that are ready to run and waiting their turn is three. this is some ways to measure of CPU saturation how much extra work is being asked of the CPUs that they can’t perform because they’re already fairly busy. the last three columns are helping us understand CPU utilization and this column is helping understand simply saturation. how can we have saturation at the same time as idle time? what we need to remember here(R) is this is one-second average and a lot can happen within a second. so I can have a burst of events where a number of threads wake up with wanting to do work at the same time and they need to queue. they then get dispatched and executed and then for the remainder of that second you have idle time. so this is why you can have what appears to be saturation and yet idle during the same interval. these last columns: CPU, user and idle, they are a very accurate measurement. the first column is only sampled once per second so it’s fairly coarse and you’ll see a lot of fluctuation. to get a better understanding of that you can use DTrace and you can measure that much more accurately. if you know how to interpret up times load averages, that’s another much more accurate way of understanding CPU saturation. so these four columns for understanding CPU usage at a VMStat. it’s just a very broad view before you dive into deeper tools.

For the memory usage, there are four key columns you can consider before you get into deeper things. they are under the memory section.

swap: this field could be renamed to virtual memory. when this goes down to zero, malloc’s failed. if an application is requesting memory and calling Malloc and that fails. sometimes you will get an error message like “out of memory”. it causes pain for application. if I’m system administrator, what I need to do is either tuner applications or add more physical disk-based swap devices which expand virtual memory.

free is a measure of DRAM, this is the actual main memory in the system. when that

Leave a Reply