View previous topic :: View next topic |
Author |
Message |
srd n00b
Joined: 13 Apr 2010 Posts: 38
|
Posted: Thu Nov 20, 2014 3:42 am Post subject: PXE + TFTP boot problem |
|
|
I'm having an issue PXE booting from another host. I can TFTP as localhost or from other machine from the command line, but the diskless appears to not be able to download the kernel upon boot, even though it seems to be getting the IP just fine. Any ideas?
This is the error that appears on the diskless nodes console.
Code: | PXE-E32: TFTP open time-out |
Here are the master nodes configs.
/etc/conf.d/in.tftpd
Code: | INTFTPD_PATH="/diskless"
INTFTPD_OPTS="-R 4096:32767 -v -s ${INTFTPD_PATH}"
|
/diskless/pxelinux.cfg/default
Code: | DEFAULT /gentoo-x86_64/boot/kernel-3.14.14-gentoo
APPEND ip=dhcp ro rootfstype=nfs root=/dev/nfs nfsroot=10.0.0.100:/diskless/gentoo-x86_64 init=/linuxrc
|
Master node /var/log/messages. The diskless nodes seems to be getting through to the master node to get the IP.
Code: | Nov 6 18:38:30 winky dhcpd: DHCPDISCOVER from 00:AA:73:D9:24:22 via eno1
Nov 6 18:38:30 winky dhcpd: DHCPOFFER on 10.0.1.1 to 00:AA:73:D9:24:22
via eno1
Nov 6 18:38:34 winky dhcpd: DHCPDISCOVER from 00:AA:73:D9:24:22 via eno1
Nov 6 18:38:34 winky dhcpd: DHCPOFFER on 10.0.1.1 to 00:AA:73:D9:24:22
via eno1
Nov 6 18:38:38 winky dhcpd: DHCPREQUEST for 10.0.1.1 (10.0.0.100) from
00:13:72:f9:54:41 via eno1
Nov 6 18:38:38 winky dhcpd: DHCPACK on 10.0.1.1 to 00:AA:73:D9:24:22 via eno1
Nov 7 00:38:38 winky in.tftpd[4481]: RRQ from 10.0.1.1 filename pxelinux.0
Nov 7 00:38:40 winky in.tftpd[4483]: RRQ from 10.0.1.1 filename pxelinux.0
Nov 7 00:38:44 winky in.tftpd[4484]: RRQ from 10.0.1.1 filename pxelinux.0
Nov 7 00:38:50 winky in.tftpd[4485]: RRQ from 10.0.1.1 filename pxelinux.0
Nov 7 00:38:58 winky in.tftpd[4487]: RRQ from 10.0.1.1 filename pxelinux.0
Nov 7 00:39:08 winky in.tftpd[4488]: RRQ from 10.0.1.1 filename pxelinux.0
|
Here's the master nodes dhcp config.
/etc/dhcp/dhcpd.conf
Code: | ddns-update-style none;
# If this DHCP server is the official DHCP server for the local
# network, the authoritative directive should be uncommented.
authoritative;
# Use this to send dhcp log messages to a different log file (you also
# have to hack syslog.conf to complete the redirection).
log-facility local7;
allow bootp;
subnet 10.0.0.0 netmask 255.255.0.0 {
default-lease-time 86400;
max-lease-time 86400;
option routers 10.0.0.1;
option broadcast-address 10.0.0.255;
option subnet-mask 255.255.0.0;
option domain-name-servers xxx.xxx.xxx.xxx, xxx.xxx.xxx.xxx;
option domain-name "mydomain.com";
}
group {
filename "pxelinux.0";
next-server 10.0.0.100;
host node-1 {
hardware ethernet 00:AA:73:D9:24:22;
fixed-address 10.0.1.1;
}
host node-2 {
hardware ethernet 00:DA:43:B2:44:33;
fixed-address 10.0.1.2;
}
}
|
Last edited by srd on Tue Feb 03, 2015 12:58 am; edited 2 times in total |
|
Back to top |
|
|
machspeed n00b
Joined: 21 Aug 2012 Posts: 11
|
Posted: Thu Nov 20, 2014 6:45 am Post subject: |
|
|
hey srd
I literally had this issue 3 days ago. Check that tftpd is actually running.
I got the same error message, "PXE-E32: TFTP open time-out" and I had foolishly uninstalled my tftpd at some stage. Reinstalled, made sure it was running (simple "ps ajfx") and then my PXE booted systems worked. |
|
Back to top |
|
|
srd n00b
Joined: 13 Apr 2010 Posts: 38
|
Posted: Fri Nov 21, 2014 3:22 am Post subject: |
|
|
Thanks, but my ftp server is running. I was able to test it locally and from another machine and download the files just fine. But when booting from the diskless node, I receive the following error.
Code: |
PXE-E32: TFTP open time-out
PXE-E32: TFTP open time-out
PXE-E32: TFTP open time-out
PXE-MOF: Exiting Broadcom PXE ROM.
|
Last edited by srd on Wed Jan 21, 2015 12:22 am; edited 1 time in total |
|
Back to top |
|
|
machspeed n00b
Joined: 21 Aug 2012 Posts: 11
|
Posted: Wed Nov 26, 2014 7:47 am Post subject: |
|
|
Edit: On re-reading everything. Maybe you'll only need to shift the "next-server" line into the subnet section. It may be a case of the PXE loader on the local machine looking to your router at 10.0.0.1 instead of your PXE server at 10.0.0.100
Ah ok, maybe this is an issue of your node locating the file on the server.
I'd assume you have pxelinux.0 in the /diskless directory along with your pxelinux.cfg directory. Pays to double check.
Something else I've noticed is in your dhcpd config, you have a group specified with the filename option there. Can you try adding the filename option directly into the host section? Also, your next-server probably should go into the subnet part.
I'll add part of mine for reference. It's nothing glamorous but it works.
Code: | allow booting;
ddns-update-style none;
subnet 172.16.1.0 netmask 255.255.255.0 {
range 172.16.1.251 172.16.1.253; # IP addresses for servicing
default-lease-time 3600;
max-lease-time 36000;
option domain-name-servers 172.16.1.254,192.168.1.1;
option routers 172.16.1.254;
option broadcast-address 172.16.1.255;
next-server 172.16.1.254;
}
host twilight {
hardware ethernet 00:1e:0b:2d:d7:d9;
fixed-address 172.16.1.1;
server-name "twilight-sparkle";
filename "pxelinux.0";
} |
|
|
Back to top |
|
|
krinn Watchman
Joined: 02 May 2003 Posts: 7470
|
|
Back to top |
|
|
srd n00b
Joined: 13 Apr 2010 Posts: 38
|
Posted: Wed Jan 21, 2015 12:23 am Post subject: |
|
|
I've still been unable to get these diskless nodes to boot.
The following is from running "tcpdump port 69" on the master.
Code: |
19:25:39.387642 IP 10.0.1.1.2070 > 10.0.0.11.tftp: 27 RRQ "pxelinux.0" octet tsize 0
19:25:41.412420 IP 10.0.1.1.2071 > 10.0.0.11.tftp: 27 RRQ "pxelinux.0" octet tsize 0
19:25:45.421847 IP 10.0.1.1.2072 > 10.0.0.11.tftp: 27 RRQ "pxelinux.0" octet tsize 0
19:25:51.408511 IP 10.0.1.1.2073 > 10.0.0.11.tftp: 27 RRQ "pxelinux.0" octet tsize 0
19:25:59.372434 IP 10.0.1.1.2074 > 10.0.0.11.tftp: 27 RRQ "pxelinux.0" octet tsize 0
19:26:09.317169 IP 10.0.1.1.2075 > 10.0.0.11.tftp: 32 RRQ "pxelinux.0" octet blksize 1456
19:26:45.343473 IP 10.0.1.1.2076 > 10.0.0.11.tftp: 32 RRQ "pxelinux.0" octet blksize 1456
19:27:57.348274 IP 10.0.1.1.2077 > 10.0.0.11.tftp: 32 RRQ "pxelinux.0" octet blksize 1456
19:29:45.328035 IP 10.0.1.1.2078 > 10.0.0.11.tftp: 32 RRQ "pxelinux.0" octet blksize 1456
19:32:09.282724 IP 10.0.1.1.2079 > 10.0.0.11.tftp: 32 RRQ "pxelinux.0" octet blksize 1456
|
@krinn - I followed your link, but I'm not even at the point of trying to get NFS to boot (though I do plan to use NFSv4). My issue is that I cannot get the client to download the kernel, so the kernel hasn't even started loading yet.
This is a new master node. I have done the same thing with an older node thats been running for a while, but a new master node w/ a new install configured the exact same way, I'm having trouble with.
@machspeed - I don't see that I have an issue with the dhcpd config yet. I believe this is strictly related to syslinux, pxe, and tftp. Also, the diskless node is getting the correct ip and at the point of trying to download the kernel.
Note that I can grab the file manually using the below commands to get the files such as pxelinux.0 or even the path with the kernel from another machine. This works find so I know the TFTP server is running correctly.
Code: | $ tftp 10.0.0.11
$ get pxelinux.0
$ quit
|
|
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3404
|
Posted: Wed Jan 21, 2015 8:00 pm Post subject: |
|
|
My working dhcpd config, comments and non-pxe zone striped, the key positions set bold
Quote: | ddns-update-style none;
option domain-name "local";
default-lease-time 3000;
max-lease-time 7200;
log-facility local7;
option subnet-mask 255.255.255.128;
option domain-name-servers W.X.Y.Z;
authoritative;
ping-check = 1;
subnet 10.0.1.0 netmask 255.255.255.128 { # PXE zone
range 10.0.1.2 10.0.1.100;
option routers 10.0.1.1;
next-server 10.0.1.1;
filename "pxelinux.0";
} |
Also, show us your tftp server's config and the absolute path to pxelinux.0 on your host. It's easy to make a little mistake there that will break pxe but will not break other tftp clients.
It's also pretty important to know what tftp daeomon you use. They might have different variables. |
|
Back to top |
|
|
srd n00b
Joined: 13 Apr 2010 Posts: 38
|
Posted: Wed Jan 21, 2015 9:27 pm Post subject: |
|
|
Here is /etc/conf.d/in.tftpd for tftp-hpa.
Code: |
INTFTPD_PATH="/diskless"
INTFTPD_OPTS="-R 4096:32767 -v -s ${INTFTPD_PATH}"
|
Here is a listing of the contents of /diskless. There exists another copy of the file pxelinux.0 within the directory gentoo-x86_64 in case I had that path misconfigured.
Code: |
drwxr-xr-x 20 root root 4096 Jan 20 19:26 gentoo-x86_64
-rw-r--r-- 1 root root 115304 Jan 20 18:59 ldlinux.c32
drwx------ 2 root root 16384 Sep 6 10:12 lost+found
-rw-r--r-- 1 root root 40577 Jan 12 19:12 pxelinux.0
drwxr-xr-x 3 root root 4096 Jan 20 20:16 pxelinux.cfg
|
Here is dhcpd.conf, the nodes are set as static and here the IP for the master has been changed to 10.0.0.10 as well as in the pxelinux.cfg/default file.
Code: |
ddns-update-style none;
authoritative;
log-facility local7;
allow booting;
allow bootp;
subnet 10.0.0.0 netmask 255.255.0.0 {
default-lease-time 86400;
max-lease-time 86400;
option routers 10.0.0.1;
option broadcast-address 10.0.0.255;
option subnet-mask 255.255.0.0;
option domain-name-servers v.x.y.z, w.x.y.z;
option domain-name "x.y.z";
}
group {
next-server 10.0.0.10;
filename "pxelinux.0";
# also tried filename as "/gentoo-x86_64/pxelinux.0"
host node-1 {
hardware ethernet 00:13:72:F9:54:41;
fixed-address 10.0.1.1;
}
# ...
}
|
I'm thinking that dhcp is working because the client shows info like the following when booting and just before the TFTP server times out (as shown above). Also, the fact that the TFTP server shows activity on the correct master node seems to say its gotten past the DHCP config.
Code: |
CLIENT MAX ADDR: xxxxxxxxxxxxxx GUID: xxxxxxxxxxxxxxxxxxxx
CLIENT IP: 10.0.1.1 MASK 255.255.0.0 DHCP IP: 10.0.0.11
GATEWAY IP 10.0.0.1
|
|
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3404
|
Posted: Thu Jan 22, 2015 10:28 pm Post subject: |
|
|
I had been using wireshark to sniff on the network traffix during dhcp negotiation. This let's you see at what point things go wrong.
Mine setup is very different
Quote: | ddns-update-style none;
option domain-name "local";
default-lease-time 3000;
max-lease-time 7200;
log-facility local7;
option subnet-mask 255.255.255.128;
option domain-name-servers X.X.X.X;
authoritative;
ping-check = 1;
filename "pxelinux.0";
subnet 10.0.1.0 netmask 255.255.255.128 {
range 10.0.1.2 10.0.1.100;
option routers 10.0.1.1;
next-server 10.0.1.1;
}
|
I marked stuff that seems to be most important with bold I am not completly sure about ping-check, but this config does work. Well, your config should work as well, but we can't be sure until you manage to boot with it.
So, let's do some tuning here:
Quote: | ddns-update-style none;
authoritative;
log-facility local7;
authoritative;
ping-check = 1;
option subnet-mask 255.255.0.0;
subnet 10.0.0.0 netmask 255.255.0.0 {
default-lease-time 86400;
max-lease-time 86400;
option routers 10.0.0.1;
option broadcast-address 10.0.255.255;
option subnet-mask 255.255.0.0;
option domain-name-servers v.x.y.z, w.x.y.z;
option domain-name "x.y.z";
}
host node-1 {
hardware ethernet 00:13:72:F9:54:41;
fixed-address 10.0.1.1;
next-server 10.0.0.10;
filename "pxelinux.0";
}
| Broadcast address seems to be bad. I think with network IP 10.0/16 it should be 10.0.255.255 rather than 10.0.0.255.
I skipped group tag. If you want to only allow a single host to boot with pxe, then be it. You might consider putting it in global section though. Hosts that are nit interested will simply ignore it
Paths are fine.
AFAIR i had some problems with getting tftp to work. It's been long time ago and I'm not sure what was wrong, but I finaly installed atftp and put this in config:
Quote: | # cat /etc/conf.d/atftp
# Config file for tftp server
rc_net_vn0_need="!net lan" # custom option due to my unusual network setup. In case of doubt you don't want this line one.
TFTPD_ROOT="/mnt/linux.images/tftp"
TFTPD_OPTS="--daemon --user nobody --group nobody"
|
You might give it a shot or have a look at https://wiki.debian.org/PXEBootInstall#Set_up_TFTP_server No idea if it's up to date.
Quote: | configuration file, /etc/default/tftpd-hpa. There should be no need to modify the following default contents:
TFTP_USERNAME="tftp"
TFTP_DIRECTORY="/srv/tftp"
TFTP_ADDRESS="0.0.0.0:69"
TFTP_OPTIONS="--secure" |
Either way, I suggest you launch sniffer and have a look at traffic. Look into those dhcp offers and see where does it point the client for pxelinux.0 file. |
|
Back to top |
|
|
srd n00b
Joined: 13 Apr 2010 Posts: 38
|
Posted: Mon Feb 02, 2015 8:59 pm Post subject: |
|
|
Ugh ... so I replaced the bad broadcast address and made other changes like what you had, but nothing has changed, still seeing the TFTP server timeout.
I also put wireshark on and it looks as if the DHCP server is working fine, 1) because the host is showing the correct DHCP IP address on it's boot screen, and 2) wireshark is showing the Discover, Offer, Request, and Ack between the server and the client node, and 3) /var/log/messages on the server side is showing a RRQ from the proper node ip (as given by DHCP) for the filename /pxelinux.0.
From what I understand, this line in /var/log/messages is telling me that the client is doing a read request from the tftp server, and it looks correct. The file is there, has read perms ...
Code: |
in.tftpd[5130]: RRQ from 10.0.1.1 filename pxelinux.0
|
Also, I did give atftp a try, but same thing. So I went back to in.tftp because I have this same config working on a much older server.
What it looks like to me is that the client makes a read request (RRQ), but the server is not sending the data packets. |
|
Back to top |
|
|
apecksec n00b
Joined: 11 Mar 2024 Posts: 1
|
Posted: Mon Mar 11, 2024 8:53 pm Post subject: tftp timeout = network mtu issue |
|
|
I have struggled with the same problem for a LONG time. This was very difficult to figure out...
For me it was an MTU issue. Network MTu was 1476 but TFTP default MTU is 1500 and some TFTP clients didn't seem to work like that, while others were fine.
tftp will lkely be trying to send 1500 byte packets, but your network will be fragmenting those packets if the mtu is <1500 (like when going through a tunnel). So you have to tell the tftp server to use an MTU which is less than the MTU of your traffic flow. In my case I just used 1300 knowing it's well below my mtu of 1450 or so.
So, try adding '-B 1300' to your tftpd config and restarting the tftpd daemon. Mine looks like this:
/usr/sbin/in.tftpd -B 1300 -l -v -v -v -m /etc/tftpd.conf -s /var/public/tftproot
hope this helps someone out there in the future |
|
Back to top |
|
|
|