View previous topic :: View next topic |
Author |
Message |
mauricev Apprentice

Joined: 22 Mar 2004 Posts: 205
|
Posted: Fri Nov 19, 2021 10:30 pm Post subject: serious server can be keyboard sluggish |
|
|
PowerEdge R820 here with two 6-core E5-4607s and 96 GB RAM, a reasonably fast machine. It's primary purpose is as a CIFS-server. While that's being utilized (by just a handful of users), typing on an ssh terminal can become sluggish, subject to hangups and non-responsiveness.
Load averages barely topping 2 when this happens. A mere 2 GB RAM being utilized.
Gentoo is installed with keyword ~amd64 and everything is mostly up to date with kernel 5.14.13-gentoo-x86_64. Kernel includes ZFS modules.
What might be causing this? |
|
Back to top |
|
 |
szatox Advocate

Joined: 27 Aug 2013 Posts: 3583
|
Posted: Fri Nov 19, 2021 11:00 pm Post subject: |
|
|
Quote: | typing on an ssh terminal can become sluggish, subject to hangups and non-responsiveness. [...]
What might be causing this?
| Poor internet connection.
Not 100% reliable way to test it, but it's quick and easy: run ping in parallel with your ssh session and watch for lags and packet losses. |
|
Back to top |
|
 |
Hu Administrator

Joined: 06 Mar 2007 Posts: 23277
|
Posted: Sat Nov 20, 2021 1:49 am Post subject: |
|
|
If you halt all CIFS service, does ssh responsiveness improve?
When ssh responsiveness is poor, what percentage of the server's uplink and downlink are in use? |
|
Back to top |
|
 |
mauricev Apprentice

Joined: 22 Mar 2004 Posts: 205
|
Posted: Sat Nov 20, 2021 2:32 am Post subject: |
|
|
szatox wrote: | Quote: | typing on an ssh terminal can become sluggish, subject to hangups and non-responsiveness. [...]
What might be causing this?
| Poor internet connection.
Not 100% reliable way to test it, but it's quick and easy: run ping in parallel with your ssh session and watch for lags and packet losses. |
All devices are connected to a local Cisco 6513. |
|
Back to top |
|
 |
mauricev Apprentice

Joined: 22 Mar 2004 Posts: 205
|
Posted: Sat Nov 20, 2021 2:39 am Post subject: |
|
|
Hu wrote: | If you halt all CIFS service, does ssh responsiveness improve?
When ssh responsiveness is poor, what percentage of the server's uplink and downlink are in use? |
Not quite sure what you mean by the "uplink/downlink".  |
|
Back to top |
|
 |
Hu Administrator

Joined: 06 Mar 2007 Posts: 23277
|
Posted: Sat Nov 20, 2021 5:56 pm Post subject: |
|
|
Your network card is rated for some number of MB/sec. Actual throughput will be limited by the negotiated speed with the switch, which may in turn be further limited in how fast it can pass traffic to its upstream. My question was, supposing you have a card rated for 1 GB/sec, and negotiated same with the switch, how many MB/sec upload is the CIFS server doing? How many MB/sec download? From that, we could compute that you're using your entire 1 GB/sec capacity, or only half, or only 10%.
On the CIFS server, what is the output of ethtool eth0 (or such other network device, as appropriate)? Your response to szatox suggests this is all local. Is that right? Your server, your CIFS clients, and your ssh client are all on the same LAN, and the Internet is not involved in any way? Is the ssh client also running CIFS traffic to the server at the time you notice the sluggish ssh response? |
|
Back to top |
|
 |
mauricev Apprentice

Joined: 22 Mar 2004 Posts: 205
|
Posted: Thu Jan 20, 2022 10:36 pm Post subject: |
|
|
The server is copying files to a second CIFS server and therefore this script depends upon that server's shares being properly mounted. Apparently a bug in the script allowed files to be copied when it wasn't causing the copied files to copy into the ZFS root, filling it. When the ZFS root pool got filled, the system acts in this sluggish manner. |
|
Back to top |
|
 |
mauricev Apprentice

Joined: 22 Mar 2004 Posts: 205
|
Posted: Fri Jan 21, 2022 6:30 pm Post subject: |
|
|
I take it back. The space mis-utilization was happening coincidentally simultaneously. Something else is amiss.
Indeed, if I run a copy task from the main ZFS store to another CIFS server, the system periodically grinds to a halt. Ordinarily, pinging this server from another on the same switch takes 0.4 ms. But during the hangups, it takes about 7 ms. |
|
Back to top |
|
 |
mauricev Apprentice

Joined: 22 Mar 2004 Posts: 205
|
Posted: Fri Jan 21, 2022 6:31 pm Post subject: |
|
|
I even get disconnected from ssh sessions during the periods of slowdown
Code: | client_loop: send disconnect: Broken pipe |
|
|
Back to top |
|
 |
spica Guru

Joined: 04 Jun 2021 Posts: 351
|
Posted: Fri Jan 21, 2022 10:38 pm Post subject: |
|
|
Is there a second network card installed on board? My idea is to test ssh connection via the second NIC, while the first one is under load. If no problem – this is a network bandwidth issue, you will need to configure network priorities on your switch, so ssh traffic gets, say, 10% bandwidth guaranteed. If the problem persists, then it is rather something inside server, i.e. not a network issue (Maybe %wa is too high in top?).
If the problem nature is unknown, a general advice is to try to localize a problem by cutting possible places one by one (e.g. external vs internal, disk vs cpu, etc). |
|
Back to top |
|
 |
|