View previous topic :: View next topic |
Author |
Message |
jesnow l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
Joined: 26 Apr 2006 Posts: 896
|
Posted: Sun Dec 03, 2023 7:22 pm Post subject: wireguard won't connect [solved] |
|
|
I have three remote machines connecting to my server via wireguard. After an update (not sure what caused this issue) they all quit talking to wireguard. Two of them after a reboot resumed their happy participation in wireguard, but the third one did not.
So this means: My server keys are OK, my server connection is OK, firewall, ports, everything OK, it's ready to accept connections.
The misconfig must be on the client side. But where?
I turned on debug info with
Code: | echo module wireguard +p > /sys/kernel/debug/dynamic_debug/control |
This is very useful.
Now I get kernel messages like
Code: | Dec 3 12:59:08 vanaert kernel: wireguard: wg0: Sending handshake initiation to peer 1 (104.176.81.55:51820)
Dec 3 12:59:14 vanaert kernel: wireguard: wg0: Handshake for peer 1 (104.176.81.55:51820) did not complete after 5 seconds, retrying (try 5)
|
Over on the server, there is no record of any packets arriving, I tried a couple of different ways, like tcpdump. I can see keepalive packats coming in from all the other machines, but not this one. It seems I've compiled wg in to the kernel, so I can't turn on messages there, but tcpdump is showing the packets arriving and leaving.
The other machines connect just fine. I verified all the keys, those are correct, but it doesn't seem like the packets are arriving. I can ssh just fine to the endpoint, but I can't send any wireguard packets. It did this a week ago and then it just started working again. In many of the forums it says to regenerate all the keys, but I don't want to do that when I can *see* that they all match. And they *did* work. They just stopped.
I'm a little perplexed, but it sounds like I need to talk to my network people. Any ideas how to go about debugging greatly appreciated.
Cheers,
Jon.
Last edited by jesnow on Wed Feb 07, 2024 7:41 pm; edited 2 times in total |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
pietinger Moderator
![Moderator Moderator](/images/ranks/rank-mod.gif)
Joined: 17 Oct 2006 Posts: 5390 Location: Bavaria
|
Posted: Sun Dec 03, 2023 9:37 pm Post subject: |
|
|
With what you have written, my first question would be: WHICH machines have been updated ? Server or the three machines (or all 4) ?
And my second question: WHAT was updated ? Kernel or world (or both) ?
IF "world" which applications ?
IF kernel: Have you checked the .config of the three clients against each other ? _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
jesnow l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
Joined: 26 Apr 2006 Posts: 896
|
Posted: Mon Dec 04, 2023 7:54 pm Post subject: |
|
|
pietinger wrote: | With what you have written, my first question would be: WHICH machines have been updated ? Server or the three machines (or all 4) ?
|
All 4, it was emerge -DNua world day.
Quote: | And my second question: WHAT was updated ? Kernel or world (or both) ?
|
Just world, but I later (after this problem manifested) recompiled the kernel on the machine in question to solve a different problem with nvidia-drivers.
Quote: | IF "world" which applications ? |
Ah, interesting, good question, I am checking the emerge logs now.
Quote: | IF kernel: Have you checked the .config of the three clients against each other ? |
No, but I will do that. WG "just worked" without a kernel compile when I installed the client machines.
They should all be the same.
Thanks for the ideas, will report back.
Cheers,
Jon. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
jesnow l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
Joined: 26 Apr 2006 Posts: 896
|
Posted: Mon Dec 04, 2023 8:15 pm Post subject: |
|
|
Interesting that of the two machines where wg is still working, one doesn't have kde installed (it is headless) and the other didn't undergo the latest update of kde-frameworks from a few days ago. We'll see if that changes anything. Why would that make a difference? who knows.
[update] kde-frameworks (and everything) updated on the *working* client. Still works.
[update2] everything else on that machine works, just not wg. So I went back (for the moment) to mounting samba over an ssh reverse tunnel, this works perfectly, as it did before. I'll bet that if I make a reverse tunnel for nfs, that will work too.
So I'm not impressed for the moment with my ability to debug wireguard. Or wireguard's amenability to being debugged. Samba is a nightmare to get working at all times, it has misleading and infuriating error messages (such as my favorite: "Unknown Error"). Ssh is way too much like a swiss army knife, it should not be acting like a router or a jump host. It should be telnet plus encryption, but thank heaven for it is all I can say. It *works*!
Of course I'm experiencing this issue while I'm in the countdown to a major deadline plus travel. I won't be able to debug it any more until possibly January. -j |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
pietinger Moderator
![Moderator Moderator](/images/ranks/rank-mod.gif)
Joined: 17 Oct 2006 Posts: 5390 Location: Bavaria
|
Posted: Tue Dec 05, 2023 1:41 pm Post subject: |
|
|
jesnow,
I dont think that kde has an influence to your wg problem. So, if server and two clients are working, only this 3rd client has a problem. If there was no kernel change (config+version) - because it was "only" a world update - then it would be very interesting what was updated on this client ... glibc ?
Do you have some error log entries in dmesg on this client ?
Usually a world update does not change any configuration ... so maybe you have a problem with an executable binary ... maybe think about rebuilding everything with
Code: | # emerge -ev -X gcc -X glibc -X gentoo-sources -X linux-firmware -X intel-microcode -X linux-headers -X baselayout @world |
? _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
jesnow l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
Joined: 26 Apr 2006 Posts: 896
|
Posted: Tue Dec 05, 2023 2:19 pm Post subject: |
|
|
Thanks! It's going to turn out to be something stupid. I just know it. I'm back in business now using an ssh tunnel temporarily.
Here's the formerly working setup on the broken client (vanaert):
Code: |
vanaert jesnow # wg
interface: wg0
public key: RtHD8gI5snDv9RXaM97FFkA6GuO8lUl/WKILbPXPFmg=
private key: (hidden)
listening port: 51820
peer: bFLHMoAXDQY+Hh/k+HTNnLAB8QYx4JDfiqc8JdmF+n4=
endpoint: 104.176.81.55:51820
allowed ips: 10.0.17.1/32
transfer: 0 B received, 3.32 MiB sent
persistent keepalive: every 30 seconds
|
Here's the setup on the working client (in the same network):
Code: |
interface: wg0
public key: ha8KoQAPFIUy1xGun7CLZyN6O0HcZVuV0LUk0SoSDk4=
private key: (hidden)
listening port: 51820
peer: bFLHMoAXDQY+Hh/k+HTNnLAB8QYx4JDfiqc8JdmF+n4=
endpoint: 104.176.81.55:51820
allowed ips: 10.0.17.1/32
latest handshake: 36 seconds ago
transfer: 57.89 MiB received, 5.34 MiB sent
persistent keepalive: every 30 seconds
|
And here's the setup on the server (in my home):
Code: |
merckx /home/jesnow # wg
interface: wg0
public key: bFLHMoAXDQY+Hh/k+HTNnLAB8QYx4JDfiqc8JdmF+n4=
private key: (hidden)
listening port: 51820
peer: ha8KoQAPFIUy1xGun7CLZyN6O0HcZVuV0LUk0SoSDk4=
endpoint: 130.39.188.145:51820
allowed ips: 10.0.17.2/32
latest handshake: 8 seconds ago
transfer: 5.62 MiB received, 58.23 MiB sent
peer: 9MxmU1PRwv2K2OoR66KXh5SHi+y73ujxxBR9hJW9tzI=
endpoint: 192.168.1.254:51820
allowed ips: 10.0.17.3/32
latest handshake: 1 minute, 35 seconds ago
transfer: 1.10 MiB received, 6.61 MiB sent
peer: 05DE17WTa1mf7dXraMy9ZxMuFI13lRlIwBWOKuZPERE=
endpoint: 130.39.190.3:56536
allowed ips: 10.0.17.6/32
latest handshake: 1 minute, 36 seconds ago
transfer: 483.91 KiB received, 222.79 KiB sent
peer: RtHD8gI5snDv9RXaM97FFkA6GuO8lUl/WKILbPXPFmg=
allowed ips: 10.0.17.4/32
peer: qsqPdMVqEvF7+wTlEdZs6BbA48uVNhNprWq7erEIRzo=
allowed ips: 10.0.17.5/32
merckx /home/jesnow #
|
The only messages I've been able to generate are the ones in my OP: "Did not complete". It's almost as if IT were filtering that port on that machine. I don't have time to follow up with them until the start of next year. But both other machines in that network can connect hunky dory. Very perplexing.
In order to get the kernel messages on the server machine I think I have to recompile the kernel with wg as a module so I can turn on debugging.
Cheers,
Jon. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
pietinger Moderator
![Moderator Moderator](/images/ranks/rank-mod.gif)
Joined: 17 Oct 2006 Posts: 5390 Location: Bavaria
|
Posted: Tue Dec 05, 2023 2:42 pm Post subject: |
|
|
jesnow wrote: | [...] It's almost as if IT were filtering that port on that machine. |
It would of course be a great coincidence if your world update coincided with a change to a firewall ... but you can't rule it out.
Many greetings,
Peter _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
jesnow l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
Joined: 26 Apr 2006 Posts: 896
|
Posted: Tue Dec 05, 2023 11:14 pm Post subject: |
|
|
I checked: that ain't it. (Pardon the pun). They do have policies for weird internet behavior, but they will shut down a MAC address completely, not turn off a port.
Both clients have in /usr/src/linux/.config:
Code: |
pogacar jesnow # grep WIREG /usr/src/linux/.config
CONFIG_WIREGUARD=m
# CONFIG_WIREGUARD_DEBUG is not set
pogacar jesnow #
|
And on the server it's:
Code: |
merckx /home/jesnow # grep WIREG /usr/src/linux/.config
CONFIG_WIREGUARD=y
# CONFIG_WIREGUARD_DEBUG is not set
|
|
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
pietinger Moderator
![Moderator Moderator](/images/ranks/rank-mod.gif)
Joined: 17 Oct 2006 Posts: 5390 Location: Bavaria
|
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
jesnow l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
Joined: 26 Apr 2006 Posts: 896
|
Posted: Fri Jan 05, 2024 7:10 pm Post subject: |
|
|
I had a month of professional travel, and over Christmas they took the power down for a week, rebooting the entire network. It was a mess. And when everything came back up, none of the machines reconnected -- a separate issue.
So once I got in and restarted networking and wireguard on all three machines, everything worked perfectly as before all this craziness. I had not changed the wg keys, or changed any kernel settings. It all just worked for no good reason. Very dissatisfying, but nothing I can do about that now.
So that's that. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
jesnow l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
Joined: 26 Apr 2006 Posts: 896
|
Posted: Wed Feb 07, 2024 7:40 pm Post subject: |
|
|
I finally solved this problem. net-vpn/wireguard-tools is installed for only the current kernel. It must be reinstalled for any new kernels. Unloading and reloading the modules doesn't seem to be enough, it requires a reboot.
Cheers,
Jon. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
pietinger Moderator
![Moderator Moderator](/images/ranks/rank-mod.gif)
Joined: 17 Oct 2006 Posts: 5390 Location: Bavaria
|
Posted: Wed Feb 07, 2024 10:17 pm Post subject: |
|
|
I can't believe that the cause was the wireguard-tools (because they only contain user space programs and the current version 1.0.20210424 has not been updated for a long time). But I have seen the new thread and will follow it with interest.
Cheers,
Peter _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
jesnow l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
Joined: 26 Apr 2006 Posts: 896
|
Posted: Wed Feb 07, 2024 11:32 pm Post subject: |
|
|
I'll comment on the newer thread. Short answer, I think it's a component of the install that does the needed thing, which could also be done by hand. The standard solution is to regenerate all the keys, which works, but is also not the problem.
pietinger wrote: | I can't believe that the cause was the wireguard-tools (because they only contain user space programs and the current version 1.0.20210424 has not been updated for a long time). But I have seen the new thread and will follow it with interest.
Cheers,
Peter |
|
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|