Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Server getting unrespinsive when swapping
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
shimitar
Guru
Guru


Joined: 23 Nov 2003
Posts: 334
Location: Italy, Torino

PostPosted: Mon Nov 28, 2022 10:48 am    Post subject: Server getting unrespinsive when swapping Reply with quote

I have gentoo on a server.
Hardware is not too beefy (6gm RAM, oldish Xeon, mechanical HDD, but quite a nice one).

When the system starts swappig, it gets totally unresponsive. It's impossible to login from SSH and even locally from console is basically unresponsive. Probably with some 30 minutes i could login, who knows.

I have a console only installation, kernel generated with genkernel.

I have a meager 2GB of swap, since actually swap is NOT needed at all.

Usually i get the swap issues when building some updates (i need to have a few huge packages installed that take lots of ram to emerge) or the occasional process that has a bug in it (we use the server to test some software we write).

system is not "dead" when this happens... but very very slow to respond to any command, from already logged in windows, like even entering "return" in shell.

there seems not to be any defective RAM or hard-drive.

Anything i could check or change maybe in kernel?
_________________
Willy Gardiol
willy@gardiol.org
Back to top
View user's profile Send private message
Banana
Moderator
Moderator


Joined: 21 May 2004
Posts: 1893
Location: Germany

PostPosted: Mon Nov 28, 2022 11:08 am    Post subject: Reply with quote

well if you have processes which need more RAM your provide, swapping will happen. Either do not run those processes or put some more RAM in the server.
_________________
Forum Guidelines

PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire
Back to top
View user's profile Send private message
shimitar
Guru
Guru


Joined: 23 Nov 2003
Posts: 334
Location: Italy, Torino

PostPosted: Mon Nov 28, 2022 11:33 am    Post subject: Reply with quote

Swapping is not the issue here.
I know swapp will happen, and it's the only way to go, i cannot add RAM.

But swap should not boggle the machine down to the point you cannot even log in locally.

Something else is going on, maybe on disk scheduling or such, so i was hoping somebody could give some insights on where to look and what to try.
_________________
Willy Gardiol
willy@gardiol.org
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3504

PostPosted: Mon Nov 28, 2022 12:22 pm    Post subject: Reply with quote

Quote:
Usually i get the swap issues when building some updates (i need to have a few huge packages installed that take lots of ram to emerge) or the occasional process that has a bug in it (we use the server to test some software we write).

Ok, so, since you seem to know the culprit, why not just lower portage's priority?
You can run it with nice value of 19 and ionice class 3. If anything causes your system to stall, this process will be stalled more than anything else providing time for interactive tasks before making additional requests hogging more resources.
With this your system might still feel a bit sluggish at times, but it shouldn't get completely unresponsive anymore.
Back to top
View user's profile Send private message
shimitar
Guru
Guru


Joined: 23 Nov 2003
Posts: 334
Location: Italy, Torino

PostPosted: Mon Nov 28, 2022 12:23 pm    Post subject: Reply with quote

I will try that... how do i set nice and ionice in portage?
_________________
Willy Gardiol
willy@gardiol.org
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3504

PostPosted: Mon Nov 28, 2022 12:39 pm    Post subject: Reply with quote

From man make.conf:
Code:

       PORTAGE_IONICE_COMMAND = [ionice command string]
              This  variable  should contain a command for portage to call in order to adjust the io priority of portage and its subprocesses. The command string should contain a \${PID} place-holder that will be substituted with an
              integer pid. For example, a value of "ionice -c 3 -p \${PID}" will set idle io priority. For more information about ionice, see ionice(1). This variable is unset by default.
              Portage will also set the autogroup-nice value (see fBsched(7))), if FEATURES="pid-sandbox" is enabled.

       PORTAGE_NICENESS = [number]
              The  value  of  this variable will be added to the current nice level that emerge is running at.  In other words, this will not set the nice level, it will increment it.  For more information about nice levels and what
              are acceptable ranges, see nice(1).

Also, forgot to mentioned this one:
Code:
       MAKEOPTS
              Use  this  variable if you want to use parallel make.  For example, if you have a dual-processor system, set this variable to "-j2" or "-j3" for enhanced build performance with many packages. Suggested settings are be‐
              tween CPUs+1 and 2*CPUs+1. In order to avoid excess load, the --load-average option is recommended.  For more information, see make(1). Also see emerge(1) for information about analogous --jobs and  --load-average  op‐
              tions.  Defaults to the number of processors if left unset.

Disk IO counts towards load AND makes other processes wait for resources which also counts towards load, so excessive swapping causes load as measured to go off the charts. Very convenient, just tell make not to push it past a threshold. The exact value is to be determined experimentally by you, since it depends on machine's workload, but something in range of twice the core count should be a good start.
If you use parallel emerge, you can add load to PORTAGE_DEFAULT_OPTS too, to prevent it from launching additional builds on an already overloaded machine. Make sure this load-average value is lower than MAKEOPTS load value though; if portage starts too many jobs, make can't fix it on its end by not forking.
Back to top
View user's profile Send private message
shimitar
Guru
Guru


Joined: 23 Nov 2003
Posts: 334
Location: Italy, Torino

PostPosted: Mon Nov 28, 2022 1:03 pm    Post subject: Reply with quote

Well, it's not that.
As doesnt matter, when emerging NodeJS the machine hangs anyway even with 0 swap...
_________________
Willy Gardiol
willy@gardiol.org
Back to top
View user's profile Send private message
shimitar
Guru
Guru


Joined: 23 Nov 2003
Posts: 334
Location: Italy, Torino

PostPosted: Mon Nov 28, 2022 1:10 pm    Post subject: Reply with quote

I have tried to set the portage scheduling to "idle" but the problem is not solved. I am running now again memtest to check for bad hardware...
Nothing about the disk in the logs.
_________________
Willy Gardiol
willy@gardiol.org
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Mon Nov 28, 2022 2:56 pm    Post subject: Reply with quote

What about dmesg and the system log files? Are there any error messages? Hanging tasks, kernel oopses, interrupt errors, hardware errors?
Quote:
But swap should not boggle the machine down to the point you cannot even log in locally.

That's not true. Excessive swapping can lead to unresponsive machines.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23109

PostPosted: Mon Nov 28, 2022 5:01 pm    Post subject: Reply with quote

The lack of a swap area does not prevent all use of disk I/O to cover memory shortage. It only limits the kernel's options. One option the kernel always has available is to discard from memory the unmodified code and read-only-data pages of loaded processes, which it can recover by re-reading them from the backing file later. This option is especially bad for performance, since it can cause the code that handles your shell to be swapped out, requiring the kernel to re-read the shell's executable back into memory when next you want service. Consider how the responsiveness will feel when reacting to your carriage return requires loading the shell, part of sshd, and maybe some C libraries, from a spinning disk. Further, if the machine is overloaded, those pages will likely be discarded again quickly, forcing you to load them again on your next keystroke.
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Mon Nov 28, 2022 7:28 pm    Post subject: Reply with quote

It is to be expected with bad hardware. And yes, it is bad. Mechanical disks are not suitable to replace RAM. Most disk controllers get priority over everything else. RAM works at thousands of megs per second. A mechanical drive in the most optimistic situations works at around 100 megs per second. When the system is forced to swap, it will start to move data around between ram and disk. When it does that, you will see IOwait go up in top, not to 100% but even 10% is game over. Reason, again, is bad hardware. It's the way most PC motherboards are designed. North bridge/south bridge. The simple explanation is that CPU, memory and GPU sit in the north bridge which is designed to be very fast and allow for fast movement of data. But the problem is the south bridge which includes everything else. Sata controller, keyboard, mouse, network... all that other stuff. When the north bridge floods the south bridge with data, then everything is going to the sata controller. There's nothing in the software that can be done to prevent this. Except not use swap. There are other considerations as well, like number of threads. But I bet you have one of those systems that chokes when you write data to disk. Like just copy a file from a very fast medium with maximum speed. And everything else seems to be slowing down. Keep in mind that it happens with ONE thread. Imagine what happens when you use 2/3/4. Kernel is multithreaded, and you are running a multi-threaded application when you compile nodejs. You are not just writing with ONE thread, but writing and reading on disk with multiple threads. Swap is completely useless.

Given that you were talking about nodejs, and 6Gigs of ram, that means you can ONLY reliably use one core to compile that. Rule of thumb is AT LEAST 2 gigs of ram per core, but more reliably 4 gigs per core for c++ apps. Unfortunately that's how some hardware works. When Sata is laggy, everything else is laggy. It's a motherboard issue and it's very common. That's why they invented SSDs, and why they stick NVME drives directly in PCI-express.

BTW, when swap was invented all computers were single core. They are not anymore. Swap only works within reason. When you try to compile something that takes lets say 10 gigs of ram with 4 threads and you only have 6 gigs, what do you expect the machine to do? I am not even sure anyone looks at the swap code anymore. It's simply out of date and unsuitable for this purpose. As are mechanical disks for swap. Intel worked a lot to create optane which can be used as swap. Think they did it because they were bored?
Back to top
View user's profile Send private message
sublogic
Guru
Guru


Joined: 21 Mar 2022
Posts: 312
Location: Pennsylvania, USA

PostPosted: Tue Nov 29, 2022 2:23 am    Post subject: Reply with quote

I second szatox's suggestion to run the one offending process at lower priority --especially lower I/O priority.
Code:
$ ionice -c idle ./memory_hog

For a big emerge, you might also force single process:
Code:
# MAKEOPTS=-j1 ionice -c idle emerge --ask virtual/rust

In my experience, when a paging storm starts the system becomes very sluggish, but not dead. Pressing ENTER at a command prompt might get you a response in, oh, 30 seconds ? Not 30 hours.

Also, you can run vmstat in a second VT (since you built console-only).
Code:
$vmstat 1  # repeats at one-second intervals
Keep an eye on the "swap" and "io" columns. If no swapping of anonymous memory is going on, the "si" and "so" columns will stay near zero. Under "io", if "bi" is high (blocks in) and "bo" stays low (blocks out), you have the problem that Hu described: a process with a working set that doesn't fit in memory. Its text and library pages keep getting reclaimed, only to be faulted back in soon after.

(Expect a delay when switching virtual terminals --many seconds, not hours or days.)
Back to top
View user's profile Send private message
shimitar
Guru
Guru


Joined: 23 Nov 2003
Posts: 334
Location: Italy, Torino

PostPosted: Tue Nov 29, 2022 9:58 am    Post subject: Reply with quote

Thank you guys, this is the kind of input i was hoping for.

This is very informative and interesting, i am running tests and monitoring the situation.

Most probably will replace the OS SATA HDD with a SATA SSD (unfortunately, no NVME on this system, it's an HP Z600) to improve the situation.

I use Gentoo since the AMD K6-400mhz times, and it's sad to see that while hardware catchup with compile time, some (badly designed?) packages can still overwhelm the build process... I highly appreciate stuff like libreoffice-bin ebuilds for this reason :)
_________________
Willy Gardiol
willy@gardiol.org
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Tue Nov 29, 2022 10:08 am    Post subject: Reply with quote

shimitar wrote:
Most probably will replace the OS SATA HDD with a SATA SSD (unfortunately, no NVME on this system, it's an HP Z600) to improve the situation.

That won't help. You cannot fight excessive swapping with a faster disk.
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3504

PostPosted: Tue Nov 29, 2022 1:22 pm    Post subject: Reply with quote

That's an oversimplification.
SSD will help, to an extent. Depending on swap size, swappiness, and the the workload. And it may or may not be worth the money.


Anyway, there is another thing you might try: cgroups.
I haven't really used them and know very little about their capabilities, but I have a reason to believe you could use them to prevent swapping interactive processes out by e.g. limiting the RAM available to portage. This way portage will be unable to swap the interactive programs out.
Back to top
View user's profile Send private message
axl
Veteran
Veteran


Joined: 11 Oct 2002
Posts: 1146
Location: Romania

PostPosted: Tue Nov 29, 2022 3:27 pm    Post subject: Reply with quote

Again, in order of utility, add more ram. If that is not possible, then reduce number of threads while compiling. Redirect compiles to a machine that has more ram with distcc. Look for packages that are already compiled (-bin) if they are available.

Even if you get a SSD, you do not want to scratch it. Meaning you don't want to overuse it by writing swap on it. It will wear out. Some SSDs wear out pretty quickly if you are not careful. In the long run, the cheapest most convenient solution is to get more ram, even if you have to throw the old one in the trash, assuming you are already maxed out. Simply replace it with bigger banks of memory. RAM is not that expensive these days. Even if it's ECC server ram, you should be able to find some second hand.

Keep in mind that if a process uses a huge amount of ram it doesn't mean that it's a one time write process, meaning it takes this amount of data, writes it to ram one time, and then that's it. Its not. It will be an intense exchange of data between the CPU and ram. Read/write, read/write etc etc. Several times over. You want that exchange to happen only between CPU and RAM. If the SATA controller is involved, it will be slow regardless of the disk speed. Even the best SATA SSDs can only do a few hundreds of megs, no where near the several thousands of megs that the RAM provide.

Swap of compilations is really a bad idea, at least in my experience. It's working fine together with Zswap and Zram for things like a browser that has too many tabs open, but not for compilation. It's working ok for putting your system in hibernation. But that's about it. It's not really practical for things that use both CPU/RAM with high intensity. But for compiling c++, again, you should do your best to get AT LEAST 2GB/core, although to be safer it should be 4GB/core. Or reduce number of parallel threads until you can fit them all into ram. With your system having only 6GB, I would close X, and do compiles in console, and based on experience try 1 or 2 threads. MAX.

I am sorry there aren't better practical solutions for this problem.
Back to top
View user's profile Send private message
wjb
l33t
l33t


Joined: 10 Jul 2005
Posts: 645
Location: Fife, Scotland

PostPosted: Tue Nov 29, 2022 8:40 pm    Post subject: Reply with quote

As axl and others have said, portage needs basically 2G RAM per core when compiling. It's worth trying "-j3" in MAKEOPTS, but you may find you need "-j2". There are some huge builds that need more like 4G/core - see Gentoo Wiki for adjusting MAKEOPTS per package for these.

Something else you can do if there is another (better) Gentoo PC around is use distcc to do the build over there.
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Tue Nov 29, 2022 8:43 pm    Post subject: Reply with quote

Do you use a tmpfs? /tmp or /var/tmp/portage? Don't do that if you're short of memory.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum