[solved] OCFS2 Causing super high load (1200+)

planet-admin · Last edited by planet-admin on Mon Mar 19, 2007 11:05 pm; edited 1 time in total

I use ocfs2 in a production environment, with gentoo 2006.1, kernel 2.6.20, qlogic fiber channel cards, attached to a 16 drive raid array. I have ocfs2-tools 1.2.1.

These machines are a cluster of web servers, all sharing the same data, all are attached to the save volume formatted in ocfs2.

Randomly, with no immediately discernible reason, all the machines will have a load average of 1200, and all apache processes will be in an uninterruptible sleep state. (This of course causes them to be taken out of the server pool, as I use ldirectord with healthchecks)

I have looked at everything I know to look at, there is no i/o load, no cpu load, and yet load averages are 1200+. I can do an lsof and grep for my ocfs2 mounted directory (/mnt/www) and see a few (maybe 5) open files in that directory (lets say theyre all jpgs).

This is a 3 node cluster, so not a 2 node.

Strangely, when I type mount all the ocfs2 partitions are listed as heartbeat=local, on all 3 machines, but I know they are talking, since dmesg reports which nodes are in the cluster. (Nodes 0 1 2, etc)

OCFS2 is great for sharing storage, but when it locks up the machines(not fencing, but just unable to serve files anymore), it defeats the purpose of high availability. Im sure its something Im doing wrong, as other people are successfully using it.

Any help which can be provided, is great. Id even be willing to give ocfs2 developers access to a machine to help me to figure it out.

Thanks,
Michael
_________________
Michael S. Moody
Sr. Systems Engineer
Global Systems Consulting
Web: http://www.GlobalSystemsConsulting.com

Janne Pikkarainen · Posted: Mon Mar 19, 2007 12:35 pm Post subject:

Anything suspicious in the logs / dmesg output?
_________________
Yes, I'm the man. Now it's your turn to decide if I meant "Yes, I'm the male." or "Yes, I am the Unix Manual Page.".

planet-admin · Posted: Mon Mar 19, 2007 11:05 pm Post subject:

olli.bo · Posted: Wed Jul 25, 2007 6:56 am Post subject:

Hi,

I have a similiar Problem. In what file can I set the O2CB_HEARTBEAT_THRESHOLD-Variable?
Is it possible to check weather ocfs2 uses it?

thx
olli

planet-admin · Posted: Wed Jul 25, 2007 7:01 am Post subject:

planet-admin · Posted: Wed Jul 25, 2007 7:03 am Post subject:

olli.bo · 2.6.20-gentoo-r8

Hi, thx for the fast answer...

olli.bo · Posted: Wed Jul 25, 2007 8:16 am Post subject:

Oh, I forgot:

The schduler I'm using is deadline:
see http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html - point 74 for details.

dmesg | grep scheduler

olli.bo · Posted: Thu Jul 26, 2007 7:44 am Post subject:

Hi, I opened a new thread for my problem... See:
https://forums.gentoo.org/viewtopic-p-4160169.html#4160169