View previous topic :: View next topic |
Author |
Message |
cantao Apprentice
Joined: 07 Jan 2004 Posts: 166
|
Posted: Tue Aug 29, 2006 1:52 pm Post subject: dbus and ldap == hangs at boot |
|
|
Hi Friends!
I have a laptop that is most of the time used as a desktop, connected to a lan. Users authenticate through LDAP and their /homes are mounted using NFS.
Everything is working very fine, but... If I try to use the laptop disconnected from the lan, the boot hangs when starting DBUS. According to a few sparse references I have found, the problem seems that DBUS is trying to figure something out through LDAP (via nss_ldap). As the laptop is not connected, no LDAP server, so hanging forever.
Setting bind_policy to soft in /etc/ldap.conf was of no avail either.
Any hints?
Thanks a lot in a advance, Cantão! |
|
Back to top |
|
|
ellingsw n00b
Joined: 31 May 2004 Posts: 40 Location: Kansas, USA
|
Posted: Sun Sep 03, 2006 3:56 am Post subject: |
|
|
I have a desktop machine that is configured to authenticate users via LDAP and /home is NFS mounted. My machine is connected to the network all the time, however, no ethernet devices are up by the time D-BUS attempts to start.
During boot my machine hangs for a bit after it displays "Cleaning /tmp directory" then it hangs indefinitely when it gets to "Starting D-BUS system messagebus". And no my problem with cleaning /tmp is not due to the tpm devices line in /etc/udev/rules.d/50-udev.rules because I already commented it out.
As of right now, I've been waiting for my desktop to boot for about 5 hours. I can't tell you which version of D-BUS is installed because I cannot get my machine booted far enough to look.
Also, one other thing that really bothers me is if the LDAP server is unavailable I cannot even login as root on my desktop. |
|
Back to top |
|
|
ellingsw n00b
Joined: 31 May 2004 Posts: 40 Location: Kansas, USA
|
Posted: Sun Sep 03, 2006 5:22 am Post subject: |
|
|
I was finally able to boot my system to a point where I could disable dbus and reboot. Some services failed to start because they were unable to find the dbus service but at least I can login now.
I currently have dbus-0.60-r4 installed, which was installed on March 19. I performed an emerge sync, updated to dbus-0.61-r1 then rebooted, however, this did not fix the issue. |
|
Back to top |
|
|
cantao Apprentice
Joined: 07 Jan 2004 Posts: 166
|
Posted: Sun Sep 03, 2006 9:46 pm Post subject: |
|
|
Hi ellingsw!
Sorry for the delay in response! It' Sunday *and* I was in the middle of a power shortage.
I'm currently phisically far from the mentioned laptop, but what I can tell you now is that the "Cleaning /tmp" issue simply vanished with -- I guess -- the last baselayout update. But the D-BUS issue is really annoying.
I have created a local (non LDAP) user to be used when disconnected from the net, and took the machine for a presentation within a client's office. My surprise? The laptop simply stuck at the D-BUS thing. Very unconfortable to have 5 pairs of eyes staring blankly at you
Anyway, I have to fix this @*#($@%&$ for a new presentation wednesday. I'll try the last ~x86 dbus to see what happens. I'll keep in touch, in case of success.
Worth telling you that my /etc/nsswitch.conf is set as "files ldap". To my understanding, the system should look *first* on local files, then on LDAP, if the information (users, groups, whatever) is not locally available. Am I correct about this?
Best regards and good luck, Cantão! |
|
Back to top |
|
|
.:chrome:. Advocate
Joined: 19 Feb 2005 Posts: 4588 Location: Brescia, Italy
|
Posted: Sun Sep 03, 2006 10:58 pm Post subject: Re: dbus and ldap == hangs at boot |
|
|
set in /etc/ldap.conf
|
|
Back to top |
|
|
cantao Apprentice
Joined: 07 Jan 2004 Posts: 166
|
Posted: Mon Sep 04, 2006 10:58 am Post subject: |
|
|
Hi k.gothmog!
Yes, I tried that (it's on the original post). It didn't work and had a side effect, disallowing me to ssh into the machine. Had to revert it.
Thanks a lot, Cantão! |
|
Back to top |
|
|
UberLord Retired Dev
Joined: 18 Sep 2003 Posts: 6835 Location: Blighty
|
|
Back to top |
|
|
cantao Apprentice
Joined: 07 Jan 2004 Posts: 166
|
Posted: Mon Sep 04, 2006 11:47 am Post subject: |
|
|
Hi UberLord!
Quote: | It's not hanging, it's working out the order to start services. Due to doing a complex topological sort in bash it's a bit slow. We're trying to recitify this for baselayout-1.13 (tsort is not an option for the curious btw) |
Yep, the problem here seems the D-BUS thing. The question is, if I have "files ldap" in nsswitch.conf, and the messagebus user is local (/etc/passwd, /etc/groups), why should the service look into LDAP?
Worse, why can't ellingsw log in as root without LDAP (presuming root is local too)?
Cheers, Cantão! |
|
Back to top |
|
|
UberLord Retired Dev
Joined: 18 Sep 2003 Posts: 6835 Location: Blighty
|
Posted: Mon Sep 04, 2006 1:10 pm Post subject: |
|
|
cantao wrote: | Yep, the problem here seems the D-BUS thing. The question is, if I have "files ldap" in nsswitch.conf, and the messagebus user is local (/etc/passwd, /etc/groups), why should the service look into LDAP? |
Depends on how dbus works. If it's set to enumerate users/groups then it will look in LDAP regardless of order in nsswitch.conf
Quote: | Worse, why can't ellingsw log in as root without LDAP (presuming root is local too)? |
I would assume that the rc process hasn't completed, which disables login until it has. _________________ Use dhcpcd for all your automated network configuration needs
Use dhcpcd-ui (GTK+/Qt) as your System Tray Network tool |
|
Back to top |
|
|
cantao Apprentice
Joined: 07 Jan 2004 Posts: 166
|
Posted: Mon Sep 04, 2006 1:52 pm Post subject: |
|
|
Hi UberLord!
Quote: | Depends on how dbus works. If it's set to enumerate users/groups then it will look in LDAP regardless of order in nsswitch.conf |
Hum... Very good guess. Upstream problem, perhaps? I'm going to do some research
Cheers, Cantão! |
|
Back to top |
|
|
UberLord Retired Dev
Joined: 18 Sep 2003 Posts: 6835 Location: Blighty
|
|
Back to top |
|
|
cantao Apprentice
Joined: 07 Jan 2004 Posts: 166
|
Posted: Mon Sep 04, 2006 2:06 pm Post subject: |
|
|
UberLord wrote: | It will also do a ldap lookup if it searchs for a user/uid not in /etc/passwd |
Exactly. But, AFAIK, messagebus is the user related to D-BUS, and it's a local user.
Cheers, Cantão! |
|
Back to top |
|
|
ellingsw n00b
Joined: 31 May 2004 Posts: 40 Location: Kansas, USA
|
Posted: Thu Sep 07, 2006 6:26 am Post subject: |
|
|
I found the command "chown 0:0 /tmp/.{ICE,X11}-unix" was re-introduced into /etc/init.d/bootmisc after I had already changed it to "chown root:root /tmp/.{ICE,X11}-unix"---I can't remember the bug number. I have since updated to baselayout-1.12.4-r7 and the command is now commented out by default in the init script. I have not rebooted yet to see if my problem with "Cleaning /tmp" is fixed or at least does not delay as long.
According to the documentation on nsswitch.conf you are correct cantao but it does not appear to work that way. Of course, this does apply to calls to getpwent functions. I don't know if there are other functions that implement password `db' lookups and what programs would use them, however, I doubt /bin/login would use them.
I have the following in /etc/nsswitch.conf:
Code: | passwd: files ldap
shadow: files ldap
group: files ldap |
I do not have the bind_policy option in /etc/ldap.conf and according to nss_ldap(5) the default policy is hard_open. I can try the "bind_policy soft" option but I doubt it will help. For one, it doesn't work for cantao. And two; IIRC, login reports invalid user id and password for root instead of waiting for the LDAP timeout. root is a local account in /etc/password.
UberLord, I am presented with a login prompt so I know the rc process has completed. I can sit in front of the screen and watch my system go through the boot process but still not be able to login as root at the login prompt if the ethernet interface fails to come up. |
|
Back to top |
|
|
UberLord Retired Dev
Joined: 18 Sep 2003 Posts: 6835 Location: Blighty
|
|
Back to top |
|
|
ellingsw n00b
Joined: 31 May 2004 Posts: 40 Location: Kansas, USA
|
Posted: Sun Sep 10, 2006 11:17 pm Post subject: |
|
|
UberLord wrote: | Is that because the root password on LDAP is different from how it is locally or something? |
What?! What difference would root's password in LDAP make if LDAP is unavailable because the network interface is down. I AM typing in the correct password for the local root account.
As for "Cleaning /tmp". After updating to baselayout-1.12.4-r7 and rebooting, I do not see a noticeable delay during boot when "Cleaning /tmp".
The "bind_policy soft" option has an unfortunate consequence that makes it unusable. If bind_policy is set to soft, all users existing in LDAP cannot ssh into the box. When a user tries to ssh in, sshd authenticates the user---and displays /etc/issue in my case---then is immediately disconnected by sshd.
Here is proof for reference:
~ $> ssh hostname
This is a private system. If you do not have
an account on this system please disconnect
now. Connections are logged and any attempt
to hack into this system will be reported to
the appropriate authorities.
Connection to hostname closed by remote host.
Connection to hostname closed.
==> /var/log/auth <==
Sep 10 18:03:28 hostname sshd[24836]: Accepted publickey for username from 192.168.2.101 port 3489 ssh2
Sep 10 18:03:28 hostname sshd(pam_unix)[24838]: session opened for user username by (uid=0)
Sep 10 18:03:28 hostname sshd[24836]: nss_ldap: could not search LDAP server - Server is unavailable
Sep 10 18:03:28 hostname sshd[24836]: fatal: login_get_lastlog: Cannot find account for uid 1001
Sep 10 18:03:28 hostname sshd[24836]: syslogin_perform_logout: logout() returned an error
Sep 10 18:03:28 hostname sshd(pam_unix)[24838]: session closed for user username
==> /var/log/error <==
Sep 10 18:03:28 hostname sshd[24836]: nss_ldap: could not search LDAP server - Server is unavailable
Sep 10 18:03:28 hostname sshd[24836]: fatal: login_get_lastlog: Cannot find account for uid 1001
==> /var/log/messages <==
Sep 10 18:03:28 hostname sshd[24836]: Accepted publickey for username from 192.168.2.101 port 3489 ssh2
Sep 10 18:03:28 hostname sshd(pam_unix)[24838]: session opened for user username by (uid=0)
Sep 10 18:03:28 hostname sshd[24836]: nss_ldap: could not search LDAP server - Server is unavailable
Sep 10 18:03:28 hostname sshd[24836]: fatal: login_get_lastlog: Cannot find account for uid 1001
Sep 10 18:03:28 hostname sshd[24836]: syslogin_perform_logout: logout() returned an error
Sep 10 18:03:28 hostname sshd(pam_unix)[24838]: session closed for user username |
|
Back to top |
|
|
cantao Apprentice
Joined: 07 Jan 2004 Posts: 166
|
Posted: Sun Sep 17, 2006 10:53 pm Post subject: |
|
|
Well, as ellingsw said, bind_policy soft is a shoot on the feet, at least for remote admin
I have just given up. I'll duplicate accounts locally on the laptop and free it from LDAP.
Thanks to everybody, Cantão! |
|
Back to top |
|
|
mattsk n00b
Joined: 11 Apr 2003 Posts: 46 Location: Newcastle, Australia
|
Posted: Mon Sep 25, 2006 2:17 pm Post subject: |
|
|
I'm having similar problems. Have you tried using the timelimit or bind_timelimit in /etc/ldap.conf? Or have you tried downgrading versions of nss_ldap?
In my case, if the ldap service isn't started yet (or has been stopped for some reason) then the system is *really* slow. I've worked out that it's the system trying to connect to the ldap server for a user or group search, and I also have
set for the appropriate services in /etc/nsswitch.conf. When this happens, I can't even log in as root at the console - it times out after 60 seconds.
I am reasonably certain that the problem lies with nss_ldap. I recently updated it, and it's been since then that the problems have started. It's been a while since I've been forced to, but I'm fairly sure I was previously able to log in as root at the console, if the ldap server (which resides on the same computer, as it happens) was down for whatever reason. In addition to this, commands run at the command line sometimes took a long time to complete (even ls) - but these symptoms would go away if I removed the 'ldap' keyword on the line entries in /etc/nsswitch.conf. In fact I have to do this to be able to restart the ldap server (since it seems to do a ldap search for the ldap user). In one case so far I even had to reboot the machine into single user mode (done by editing the grub boot paramaters on the fly) to edit the nsswitch.conf file just so I could boot. I haven't tried rebooting since, but I suspect I"ll have to do it again.
During my struggles so far, I've tried setting the soft bind option - and it stopped ssh logins from happenign for me to (I'd like to know why that is - the debug slapd logs I poured through didn't offer up any clue that I could see). And setting:
Code: | timelimit 30
bind_timelimit 30 |
in /etc/ldap.conf doesn't seem to have made a difference. Part of the problem seems to stem from the fact that the default behaviour of nss_ldap is to never stop trying to connect to the ldap server.
The only real solution I have when
So with this in mind, you may want to try the above settings (maybe with smaller timeouts) , or downgrade your nss_ldap and see if either fixes the problem. I can't remember which version I upgraded from - but the emerge.log seems to indicate that it was 226. I'm currently using 249. I also just realised that I'm running version 239-r1 on one of my other servers, so I'll do some tests and see how that computer behaves when the ldap service is down.
I'd like to know *why* it insists on still checking the ldap service. Fortunately, in my case, it's quite rare for the ldap server to be offline when that computer is online. _________________ -- Matt Sk (etc) |
|
Back to top |
|
|
mattsk n00b
Joined: 11 Apr 2003 Posts: 46 Location: Newcastle, Australia
|
Posted: Mon Sep 25, 2006 2:22 pm Post subject: |
|
|
Update: my other server running version 239-r1 doesn't skip a beat when the ldap server is down. If I call groups <user> when the ldap server is up I get hte groups for that user, and if I call it when the server is down, I get an "unknown user" error.
The servers have identical /etc/nsswitch entries for "passwd:, groups:, and shadow:"
I just downgraded to that version on the main and all seems to be well. No pauses, and no need to edit nsswitch.conf to restart the ldap server.
I've temporarily solved the problem by putting
Code: | >=sys-auth/nss_ldap-249 |
in /etc/portage/package.mask
I know this is all a month and a half after you gave up, but I hope this is still helpful.
This seems to be an nss_ldap bug - but I can't find anything else about it on the net so far. Does anybody know if it goes away in later versions (I notice that there are versions 250, 250-r1, 252, and 253 all masked with the ~x86 keyword) _________________ -- Matt Sk (etc) |
|
Back to top |
|
|
cantao Apprentice
Joined: 07 Jan 2004 Posts: 166
|
Posted: Mon Sep 25, 2006 3:01 pm Post subject: |
|
|
Hi Mattsk!
Thanks a lot for your reply.
Tomorrow I'll have the laptop in hands, so I can give the several nss_ldap versions a try. I'll see what happens with the unstable versions and I'll post the results here.
mattsk wrote: | I can't remember which version I upgraded from - but the emerge.log seems to indicate that it was 226. I'm currently using 249 |
There goes a great package:
and then:
It should tell you the history behind your versions os nss_ldap (or any other package, indeed).
Thanks a lot, Cantão! |
|
Back to top |
|
|
cantao Apprentice
Joined: 07 Jan 2004 Posts: 166
|
Posted: Fri Sep 29, 2006 2:06 pm Post subject: |
|
|
It only gets worse...
I could not check the laptop (is was travelling), but I performed several updates on some machines (monolitic Xorg -> non-monolitic Xorg, openssl and gnutls), all by the book, with several revdep-rebuilds to make sure everything was ok.
And then, these updated machines are hanging at DBUS also, even connected to the LAN and with the LDAP server working fine.
I'm trying to boot them to re-emerge dbus and nss_ldap. Let's see what happens.
Cheers, Cantão! |
|
Back to top |
|
|
cantao Apprentice
Joined: 07 Jan 2004 Posts: 166
|
Posted: Fri Sep 29, 2006 2:45 pm Post subject: |
|
|
Quote: | I'm trying to boot them to re-emerge dbus and nss_ldap. Let's see what happens. |
Nope, that's not the problem. Chechink /var/log/message and the init scripts I discovered that DBUS is trying to connect to LDAP before net.eth0. After one zillon attempts, it gives up and continues.
Updating baselayout as an attempt...
Cheers, Cantão! |
|
Back to top |
|
|
blubbi Guru
Joined: 27 Apr 2003 Posts: 564 Location: Halle (Saale), Germany
|
Posted: Fri Nov 10, 2006 1:29 pm Post subject: |
|
|
Now I ran into the same problem.
System is stable and up-to-date.
Only thing that is ~x86 is udev to solve the nss_ldap and udev problem (a user called tss causes a ldap lookup) described here
https://bugs.gentoo.org/show_bug.cgi?id=99564
I worked around this problem in the following way.
I don't have any local users so I added all system users to /etc/ldap.conf:
Code: | echo "nss_initgroups_ignoreusers $(cat /etc/passwd | cut -d : -f1 | xargs |sed -e 's/ /,/g')" >> nss_initgroups_ignoreusers |
My nsswitch.conf loks like this:
passwd: files compat ldap
shadow: files compat ldap
group: files compat ldap
Now the system boots and I can login as root if there is no access to the LDAP.
But still, there is famd, nscd and kdm_config wich try to connect to ldap. I have no clue what they are trying to lookup on the LDAP. Okay, nscd tries to cache the users from LDAP but what about kdm_config and famd?
regards
blubbi |
|
Back to top |
|
|
cantao Apprentice
Joined: 07 Jan 2004 Posts: 166
|
Posted: Fri Nov 10, 2006 4:59 pm Post subject: |
|
|
Hi Friends!
In fact, I gave up on this... I recreated all users locally on the laptop, without any mention to LDAP on pam or nsswitch.conf. This way I can use the NFS mounted /home if the lan is up, or a local user if the laptop is unconnected.
Not the most elegant solution, but it worked anyway.
Thanks to all, Cantão! |
|
Back to top |
|
|
bunder Bodhisattva
Joined: 10 Apr 2004 Posts: 5947
|
Posted: Fri Nov 10, 2006 10:18 pm Post subject: |
|
|
i currently have this problem as well. i believe i'm just going to hardmask the new openldap/nss_ldap/udev versions until they fix these bootup/login bugs. tbh, this has been a problem for quite some time, and i can't believe that it hasn't been fixed properly. bind_policy soft and nss_reconnect only make this problem worse.
cheers _________________
Neddyseagoon wrote: | The problem with leaving is that you can only do it once and it reduces your influence. |
banned from #gentoo since sept 2017 |
|
Back to top |
|
|
ellingsw n00b
Joined: 31 May 2004 Posts: 40 Location: Kansas, USA
|
Posted: Mon Jan 08, 2007 4:00 am Post subject: |
|
|
Heads Up!
I had to reboot today because my system was not responding to user input from the console even though I could login remotely. Upon boot, the problem with my machine hanging indefinitely at boot when it gets to "Starting D-BUS system messagebus" is back.
I was able to get the system booted by restarting and performing an interactive boot... skipping dbus of course when it asked if I wanted to start it. After the system and NIC were up, I was able to start dbus without a problem.
I have not determined the cause of it this time but I'll see if I can figure it out.
The latest version of nss_ldap on my machine is 249 and was installed on Oct 14, 2006. My last reboot was Nov 12, 2006, which I did not have a problem with.
I have rsync 4 times since then and performed at least 2 updates with an emerge world in there somewhere after upgrading to gcc 4.1.1-r1.
Since Nov 12, the following packages, which are used during boot, have been updated:
baselayout (1.12.6)
udev (103)
dbus (0.62-r2)
There are other packages that have been updated but I doubt they are related to this issue. |
|
Back to top |
|
|
|