to_kallon Tux's lil' helper
Joined: 27 Oct 2004 Posts: 89
|
Posted: Sun Apr 03, 2005 4:23 pm Post subject: setiathome causes system hang? |
|
|
hi, hope this is an okay place for this.
my problem, firstly, is that i don't know what the problem is yet. i am trying to run setiathome on an openmosix cluster. i modify the /etc/conf.d/setiathome file to launch multiple instances, which are picked up by process migration and distributed accordingly. everything seems to go well for a while, sometimes it even returns results, but, inevitably it seems, if i am running more than 5 concurrent processes (i have 10 nodes) some of the nodes disappear. openmosix sees them as configured but unreachable, i can't ssh into them but they return a ping. the process seems to keep running, and does not go down after a which seemed odd to me, but without access i don't really know what it is doing. similarly Code: | /ert/init.d/setiathome stop | does not terminate the process on the missing nodes. also, restarting seti, only starts, in this case, 7-n processes where n is how many nodes are apparently down.
i've tried several different scripts, run from cron, to try to at least bring them back to a usable state, if not recover the data. first i wrote a perl script to restart the openmosix process, this would bring back up the setiathome process also but it was my hope that it would get migrated to another node and i could try to troubleshoot. this failed. next i tried a bash script simply stop openmosix, hoping that i'd then be able to ssh in and, again, troubleshoot. as near as i can tell, nothing happened. the node is listed as going down cleanly, but i still can't ssh in and it won't restart (i've added restart to the script recently).
part of my problem is the diagnosis of the problem, since it could, as far as i can tell, be my gentoo configuration, openmosix, or setiathome all equally likely. if anyone has any suggestions, i'm at my wits end over this one. thanks. |
|