View previous topic :: View next topic |
Author |
Message |
jakob-andreas n00b
Joined: 19 Aug 2005 Posts: 53 Location: East-Germany
|
Posted: Fri Jan 22, 2010 11:36 am Post subject: torque and maui: RMFailure 15041 |
|
|
hi,
i have to set up a cluster with open source software, so i decided to install torque and maui. everything works fine, except running jobs
after starting pbs_mom on the nodes and pbs_server on the head machine, pbs_nodes shows all the nodes as free. so i wanted to run the first job: Code: | echo "sleep 30" | qsub | the scheduler maui is also running at this time! but i get always the same error: Code: | cannot start job - RM Failure, rc: 15041, msg: 'Execution server rejected request MSG=cannot send job to mom, state=PRERUN' | looking into the admin-guide and 2 days google-ing did not help, even asking the [torqueusers]-mailinglist seems not to be helpful.
i did a tcpdump and began to wonder: normal communication works fine, as ssh, ping an so on. although the communication from port 1023 to 15001 works. but when my headnode wants to send the job on port 15002 the node machine is not reachable! but netstat shows 15002 open! firewall is disabled - so what to check and do now?
i hope someone can help me!
thx |
|
Back to top |
|
|
jakob-andreas n00b
Joined: 19 Aug 2005 Posts: 53 Location: East-Germany
|
Posted: Fri Jan 22, 2010 3:59 pm Post subject: |
|
|
HEY I FOUND THE SOLUTION!!!
i did not know that iptables is still active even when it is not started i manually had to delete some rules of the firewall that blocked all incoming packages. now it works quite well! so i can only tell you, if youu run into the same problem, double-check the iptables -L output! |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|