High load average caused by high %iowait

danny_b85 · n00b Joined: 16 Jun 2008 Posts: 10

Hi everyone,

I'm having a problem with a couple of servers on which the load average goes to values such as 17 and even 22. I've checked and rechecked and the problem wasn't being caused by high cpu usage from programs (that's what surprised me) or by the RAM being to full and going to swap, but by high %iowait values in the output of iostat.

From what I know (and please correct me if I'm wrong) %iowait goes up when information needs to be written to disk and it's either too much at once and the bandwidth is being eaten up or the information is scattered in different parts of the disk and the write process is causing the HDD's actuators move the read/write heads franticly, thus causing the same problem.

I've noticed that %iowait goes up when one of the HDD's (for some reason) drops out of the software RAID 1 array and is doing the rebuild process or when the RAM is full and the swap starts getting used, but this isn't the case now, and because the problem has manifested itself on several servers, I'm thinking it could be some kind of exploit of one/several software programs running on the server which is causing this. It could be a longshot, I could just be paranoid about it, but it's part of the daily life of a sysadmin to be paranoid.

Now my question, how can I track which process is causing high disk usage? I need something similar to the output of iostat, but at a process level, as iostat is reporting the entire disk usage status.

Please also advise to other possibilities that could lead to solving this problem.
Thank you in advance.

L.E.: this is what I mean by high %iowait:

ibins · n00b Joined: 27 Jul 2007 Posts: 27

Hi
you can try "sys-process/iotop" but you will need a kernel 2.6.20 or higher with support for TASK_IO_ACCOUNTING enabled

HeissFuss · Guru Joined: 11 Jan 2005 Posts: 414

You may also be able to tell by looking at process states.

'ps aux' or htop with sort by S.

A state of D indicates uninterruptible sleep. If there's a non filesystem process consistently in that state, that's probably the culprit.

danny_b85 · n00b Joined: 16 Jun 2008 Posts: 10

@ibins

Great suggestion with iotop, unfortunately it doesn't help me because I don't have the 2.6.20 kernel.

HeissFuss · Guru Joined: 11 Jan 2005 Posts: 414