Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
High load average caused by high %iowait
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
danny_b85
n00b
n00b


Joined: 16 Jun 2008
Posts: 10

PostPosted: Mon Sep 01, 2008 10:09 am    Post subject: High load average caused by high %iowait Reply with quote

Hi everyone,

I'm having a problem with a couple of servers on which the load average goes to values such as 17 and even 22. I've checked and rechecked and the problem wasn't being caused by high cpu usage from programs (that's what surprised me) or by the RAM being to full and going to swap, but by high %iowait values in the output of iostat.

From what I know (and please correct me if I'm wrong) %iowait goes up when information needs to be written to disk and it's either too much at once and the bandwidth is being eaten up or the information is scattered in different parts of the disk and the write process is causing the HDD's actuators move the read/write heads franticly, thus causing the same problem.

I've noticed that %iowait goes up when one of the HDD's (for some reason) drops out of the software RAID 1 array and is doing the rebuild process or when the RAM is full and the swap starts getting used, but this isn't the case now, and because the problem has manifested itself on several servers, I'm thinking it could be some kind of exploit of one/several software programs running on the server which is causing this. It could be a longshot, I could just be paranoid about it, but it's part of the daily life of a sysadmin to be paranoid.

Now my question, how can I track which process is causing high disk usage? I need something similar to the output of iostat, but at a process level, as iostat is reporting the entire disk usage status.

Please also advise to other possibilities that could lead to solving this problem.
Thank you in advance.

L.E.: this is what I mean by high %iowait:
Code:
avg-cpu:  %user   %nice    %sys %iowait   %idle
           3.74    0.00    2.99   82.04   11.22

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
hda          0.00  79.21 15.84 132.67  110.89 1063.37    55.45   531.68     7.91   188.89 1308.77   6.67  99.11
hdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
hdc          0.00  79.21 12.87 174.26  102.97 1071.29    51.49   535.64     6.28   119.43 1100.41   5.30  99.11
md0          0.00   0.00  3.96 59.41   31.68  475.25    15.84   237.62     8.00     0.00    0.00   0.00   0.00
md4          0.00   0.00  0.00 24.75    0.00   49.50     0.00    24.75     2.00     0.00    0.00   0.00   0.00
md1          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3          0.00   0.00 22.77 64.36  182.18  514.85    91.09   257.43     8.00     0.00    0.00   0.00   0.00
md2          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md5          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
Back to top
View user's profile Send private message
ibins
n00b
n00b


Joined: 27 Jul 2007
Posts: 27

PostPosted: Mon Sep 01, 2008 8:36 pm    Post subject: Reply with quote

Hi
you can try "sys-process/iotop" but you will need a kernel 2.6.20 or higher with support for TASK_IO_ACCOUNTING enabled
Back to top
View user's profile Send private message
HeissFuss
Guru
Guru


Joined: 11 Jan 2005
Posts: 414

PostPosted: Tue Sep 02, 2008 10:13 pm    Post subject: Reply with quote

You may also be able to tell by looking at process states.

'ps aux' or htop with sort by S.

A state of D indicates uninterruptible sleep. If there's a non filesystem process consistently in that state, that's probably the culprit.
Back to top
View user's profile Send private message
danny_b85
n00b
n00b


Joined: 16 Jun 2008
Posts: 10

PostPosted: Wed Sep 03, 2008 2:01 pm    Post subject: Reply with quote

@ibins

Great suggestion with iotop, unfortunately it doesn't help me because I don't have the 2.6.20 kernel.

Quote:
If there's a non filesystem process consistently in that state, that's probably the culprit.


A non filesystem process such as Apache?
Back to top
View user's profile Send private message
HeissFuss
Guru
Guru


Joined: 11 Jan 2005
Posts: 414

PostPosted: Wed Sep 03, 2008 2:22 pm    Post subject: Reply with quote

danny_b85 wrote:
@ibins

A non filesystem process such as Apache?


Even on a webserver, I don't think Apache should be consistently in that state. That's probably the cause of the IO wait.

Is Apache serving pages/files over NFS or other network filesystem by chance?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum