Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
BIG BUG: 32 bit kernel after 6.6.23 has serious memory leak
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
piggy
n00b
n00b


Joined: 28 Sep 2015
Posts: 26

PostPosted: Sun Jul 14, 2024 11:36 pm    Post subject: BIG BUG: 32 bit kernel after 6.6.23 has serious memory leak Reply with quote

I know, who needs 32 bit anymore? Well, it's e with some old virtual machines working from ages so well I don't want to abandon.

And 32 bit Gentoo was very good until >6.6.21 kerrnels, it doesn't mind the configuration, it happen with my config and with stock kernel-bin.

And it happen every time. The machine start ok, the even if it has plenty resources (4 GB ram, 8 GB swap partition, after some use and work it die.

And it start to die with this log message:

Code:
 15 01:14:01 [dbus-daemon] [session uid=1000 pid=968 pidfd=5] Failed to activate service 'org.freedesktop.portal.Desktop': timed out (service_start_timeout=120000ms)
Jul 15 01:14:01 [dbus-daemon] [session uid=1000 pid=968 pidfd=5] Failed to activate service 'org.freedesktop.impl.portal.desktop.gnome': timed out (service_start_timeout=120000ms)
Jul 15 01:14:28 [kernel] vmap allocation for size 24576 failed: use vmalloc=<size> to increase size
Jul 15 01:14:28 [kernel] [drm:vmw_bo_map_and_cache [vmwgfx]] *ERROR* Buffer object map failed: -12.
Jul 15 01:14:28 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 8 times -
Jul 15 01:14:33 [kernel] alloc_vmap_area: 11 callbacks suppressed
Jul 15 01:14:33 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 15 01:14:40 [kernel] alloc_vmap_area: 162 callbacks suppressed
Jul 15 01:14:40 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 15 01:14:45 [kernel] alloc_vmap_area: 192 callbacks suppressed
Jul 15 01:14:45 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 15 01:14:50 [kernel] alloc_vmap_area: 267 callbacks suppressed
Jul 15 01:14:50 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size



From this moment it just question of time: the machine will freeze or simply it will show a "not enough memory" for every command you go. Example: a terminal will hang and the log still show this message:

Code:
 Jul 15 01:23:50 [kernel] alloc_vmap_area: 269 callbacks suppressed
Jul 15 01:23:50 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 15 01:23:55 [kernel] alloc_vmap_area: 204 callbacks suppressed
Jul 15 01:23:55 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 15 01:24:00 [kernel] alloc_vmap_area: 104 callbacks suppressed
Jul 15 01:24:00 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 15 01:24:07 [kernel] alloc_vmap_area: 50 callbacks suppressed
Jul 15 01:24:07 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 15 01:24:13 [kernel] alloc_vmap_area: 196 callbacks suppressed
Jul 15 01:24:13 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size



/proc/meminfo show this BEFORE the problem happen:

Code:
cat /proc/meminfo
MemTotal:        4047364 kB
MemFree:         2712832 kB
MemAvailable:    3228752 kB
Buffers:          112584 kB
Cached:           582984 kB
SwapCached:            0 kB
Active:           268128 kB
Inactive:         939284 kB
Active(anon):       2188 kB
Inactive(anon):   570056 kB
Active(file):     265940 kB
Inactive(file):   369228 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:       3287048 kB
HighFree:        2126392 kB
LowTotal:         760316 kB
LowFree:          586440 kB
SwapTotal:       8505340 kB
SwapFree:        8505340 kB
Dirty:               484 kB
Writeback:             0 kB
AnonPages:        511816 kB
Mapped:           375128 kB
Shmem:             60300 kB
KReclaimable:      19428 kB
Slab:              39656 kB
SReclaimable:      19428 kB
SUnreclaim:        20228 kB
KernelStack:        4176 kB
PageTables:        13384 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    10529020 kB
Committed_AS:    2599960 kB
VmallocTotal:     122880 kB
VmallocUsed:       10504 kB
VmallocChunk:          0 kB
Percpu:             1280 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:       15776 kB
DirectMap2M:      890880 kB


And this AFTER the machine is compromised:

Code:
 MemTotal:        4047364 kB
MemFree:         2531348 kB
MemAvailable:    3144480 kB
Buffers:          125396 kB
Cached:           648452 kB
SwapCached:            0 kB
Active:           391180 kB
Inactive:         974468 kB
Active(anon):       3636 kB
Inactive(anon):   652300 kB
Active(file):     387544 kB
Inactive(file):   322168 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:       3287048 kB
HighFree:        1979884 kB
LowTotal:         760316 kB
LowFree:          551464 kB
SwapTotal:       8505340 kB
SwapFree:        8505340 kB
Dirty:               792 kB
Writeback:             0 kB
AnonPages:        591836 kB
Mapped:           392428 kB
Shmem:             64128 kB
KReclaimable:      42096 kB
Slab:              63048 kB
SReclaimable:      42096 kB
SUnreclaim:        20952 kB
KernelStack:        3856 kB
PageTables:        14688 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    10529020 kB
Committed_AS:    2852284 kB
VmallocTotal:     122880 kB
VmallocUsed:       10520 kB
VmallocChunk:          0 kB
Percpu:             1280 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:       15776 kB
DirectMap2M:      890880 kB


this is free command BEFORE it happen:

Code:
 free
               total        used        free      shared  buff/cache   available
Mem:         4047364      482144     3170256       35664      567936     3565220
Swap:        8505340           0     8505340


and ATER it happen
Code:

 total        used        free      shared  buff/cache   available
Mem:         4047364      901324     2532780       64076      816012     3146040
Swap:        8505340           0     8505340


I repeat this happe also with standard kernel config file of kernel-bin package. I ALWAYS happen for any kernel >6.6.21.

6.6.21 and <6.6.21 has no problem.

I can gain some more juice if I tweak the grub command line according to the log message:

Code:
vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -


[code]vmalloc=768]

Ok and now I post the message and I can't even reboot, if try to do so it will complain about no more resources.

this is a bug. I don't know how and where report it.

Maybe now someone can test by hisself and see this is a real problen. It happen under vmware guest, then I'm sure it will happen in real machine too.

Not a big deal for the whole majorit, 32 bit is dead then considering Gentoo is my distro and it support 2 bit I open this post.

For any info, I'm here.

Actual kernel generation this problem:

[code]Linux gentoox86vm 6.6.38-gentoo #1 SMP PREEMPT_DYNAMIC Sun Jul 14 18:39:08 CEST 2024 i686 Intel(R) Core(TM) i7-14700K GenuineIntel GNU/Linux[/code]


Last edited by piggy on Thu Jul 18, 2024 11:44 pm; edited 4 times in total
Back to top
View user's profile Send private message
grknight
Retired Dev
Retired Dev


Joined: 20 Feb 2015
Posts: 1919

PostPosted: Sun Jul 14, 2024 11:47 pm    Post subject: Reply with quote

What does grep -i vmalloc /proc/meminfo report when there is an issue?

Depending on what exactly you placed info grub, it may be too low or ineffective.
The default for vmalloc on x86-32 is 128M (vmalloc=128M literally). Perhaps you need to set the value to vmalloc=256M or vmalloc=512M (the M is important) instead.

Watch the meminfo proc interface to know what is the max and what is in use.

If it continually increases after these changes, there may be a memory leak in a vmalloc() call (without a vfree()) if it does not stabilize.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22649

PostPosted: Mon Jul 15, 2024 1:00 am    Post subject: Reply with quote

I read through the commits that are new in v6.6.22 and see nothing that is obviously related to the report that anything newer than v6.6.21 is broken. OP: can you bisect to determine which commit post-v6.6.21 introduces the problem?

I run a v6.6.x on a 32-bit system and have not observed this problem.
Back to top
View user's profile Send private message
piggy
n00b
n00b


Joined: 28 Sep 2015
Posts: 26

PostPosted: Mon Jul 15, 2024 1:41 am    Post subject: Reply with quote

Hu wrote:
I read through the commits that are new in v6.6.22 and see nothing that is obviously related to the report that anything newer than v6.6.21 is broken. OP: can you bisect to determine which commit post-v6.6.21 introduces the problem?

I run a v6.6.x on a 32-bit system and have not observed this problem.

Correct question: the first kernel I built and had the problem was 6.6.30 and it has problems both with my config and both with the kernel-bin config. Every kernel after that HAS the problem, included 6.9.x branch I tested.

Related with vmalloc=600M (this is the greatest value supported on my system, sometimes it tolerate also 768M, somegimes it doesn’t and it wont boot) string on grub config: like I said in my original post, it doesn’t change the end: you can open more apps, more tabs, more terminal compared to the default 128M value, then sooner or later the log will print those messages, the apps will start to crash and finally you can’t even type reboot or shutdown and the only alternative is reset. Sometimes, more rare, it freeze.

I repeat: with 6.6.21 andxbefore everything perfect and used for years with deep stability and satisfaction.

We can say for sure the problem, in my virtual environment, arise from 6.6.22 and 6.6.30: the latter is buggy for me, i should find the time to build one after the other 6.6.22, 6.6.23 and so on.

Then guys, this problem ie simple to reproduce. Just load everything from 6.6.30 to 6.9 on a guest virtual machine vmware and you can’t avoid the problem.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22649

PostPosted: Mon Jul 15, 2024 2:49 am    Post subject: Reply with quote

Your topic title says "Every 32 bit kernel over 6.6.21 is broken", so I thought you had already determined that every kernel past 6.6.21, starting with 6.6.22, was broken.
Back to top
View user's profile Send private message
piggy
n00b
n00b


Joined: 28 Sep 2015
Posts: 26

PostPosted: Mon Jul 15, 2024 10:55 am    Post subject: Reply with quote

Hu wrote:
Your topic title says "Every 32 bit kernel over 6.6.21 is broken", so I thought you had already determined that every kernel past 6.6.21, starting with 6.6.22, was broken.

You are correct, then considering I follow the stable kernel, if memory serve me well, the last after 6.6.21 was 6.6.30 and now 6.6.38. Both this has the problem, now I’m curious and if I can find few moments I will build 6.6.22/23/24/25/26/27/28/29.

Surely 6.9.x has the bug, 6.6.7/8 too, I tried them.

In the meantime I also have a virtual guest with Void Linux! And the problem is there too! 6.6.21 is ok, in them case 6.6.37 has the bug.

Now this should be also upscaled to bugs at kernel.org, it is so easy to reproduce it, then I don’t have any idea how to do that. Or maybe a maintainer like in the case of Gentoo (mr. Pagano?) could apply something to mitigate/resolve this problem?

It sound strange to me after all this years that a kernel could have such a big, impacting bug considering it is not 2.x.x time (yes, I’m such a long time Linux/Gentoo) user.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20485

PostPosted: Mon Jul 15, 2024 7:34 pm    Post subject: Reply with quote

piggy wrote:
It sound strange to me after all this years that a kernel could have such a big, impacting bug
Almost as if it seems more likely that it doesn't, right?

I've never had to do it, but regarding bisecting the kernel to find the problem...

https://wiki.gentoo.org/wiki/Kernel_git-bisect
_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
piggy
n00b
n00b


Joined: 28 Sep 2015
Posts: 26

PostPosted: Mon Jul 15, 2024 10:17 pm    Post subject: Reply with quote

pjp wrote:
piggy wrote:
It sound strange to me after all this years that a kernel could have such a big, impacting bug
Almost as if it seems more likely that it doesn't, right?

I've never had to do it, but regarding bisecting the kernel to find the problem...

Yes, but IT IS real, almost under VMware as a guest. I can have it happen everytime. I don't have old hardware to try real.

Example, right now:

Code:
Jul 16 00:00:47 [kernel] alloc_vmap_area: 259 callbacks suppressed
Jul 16 00:00:47 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:00:52 [kernel] alloc_vmap_area: 204 callbacks suppressed
Jul 16 00:00:52 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:01:10 [kernel] alloc_vmap_area: 133 callbacks suppressed
Jul 16 00:01:10 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:01:15 [kernel] alloc_vmap_area: 240 callbacks suppressed
Jul 16 00:01:15 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:01:17 [kernel] warn_alloc: 3 callbacks suppressed
Jul 16 00:01:17 [kernel] Web Content: vmalloc error: size 8192, vm_struct allocation failed, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
Jul 16 00:01:17 [kernel] CPU: 1 PID: 2761 Comm: Web Content Not tainted 6.6.38-gentoo #1
Jul 16 00:01:17 [kernel] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B32.2305221830 05/22/2023
Jul 16 00:01:17 [kernel] Call Trace:
Jul 16 00:01:17 [kernel]  dump_stack_lvl+0x32/0x41
Jul 16 00:01:17 [kernel]  dump_stack+0xd/0x10
Jul 16 00:01:17 [kernel]  warn_alloc+0xab/0x111
Jul 16 00:01:17 [kernel]  __vmalloc_node_range+0x73/0x345
Jul 16 00:01:17 [kernel]  __vmalloc_node+0x55/0x5d
Jul 16 00:01:17 [kernel]  ? bpf_prog_alloc_no_stats+0x1f/0xcd
Jul 16 00:01:17 [kernel]  __vmalloc+0x14/0x16
Jul 16 00:01:17 [kernel]  ? bpf_prog_alloc_no_stats+0x1f/0xcd
Jul 16 00:01:17 [kernel]  bpf_prog_alloc_no_stats+0x1f/0xcd
Jul 16 00:01:17 [kernel]  bpf_prog_alloc+0x13/0x9f
Jul 16 00:01:17 [kernel]  bpf_prog_create_from_user+0x47/0xbd
Jul 16 00:01:17 [kernel]  ? kprobe_free_init_mem+0x4c/0x4c
Jul 16 00:01:17 [kernel]  do_seccomp+0x176/0x7ac
Jul 16 00:01:17 [kernel]  ? __ia32_sys_prctl+0x47/0x5bf
Jul 16 00:01:17 [kernel]  __ia32_sys_seccomp+0x10/0x12
Jul 16 00:01:17 [kernel]  ia32_sys_call+0xd09/0x1063
Jul 16 00:01:17 [kernel]  __do_fast_syscall_32+0x7a/0x99
Jul 16 00:01:17 [kernel]  do_fast_syscall_32+0x29/0x5b
Jul 16 00:01:17 [kernel]  do_SYSENTER_32+0x15/0x17
Jul 16 00:01:17 [kernel]  entry_SYSENTER_32+0x98/0xf8
Jul 16 00:01:17 [kernel] EIP: 0xb7fc856d
Jul 16 00:01:17 [kernel] Code: c4 01 10 03 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
Jul 16 00:01:17 [kernel] EAX: ffffffda EBX: 00000001 ECX: 00000001 EDX: bf89c854
Jul 16 00:01:17 [kernel] ESI: 00000000 EDI: b7fabce0 EBP: 40000004 ESP: bf89c68c
Jul 16 00:01:17 [kernel] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000202
Jul 16 00:01:17 [kernel] Mem-Info:
Jul 16 00:01:17 [kernel] active_anon:1302 inactive_anon:188525 isolated_anon:0
Jul 16 00:01:17 [kernel]  active_file:102275 inactive_file:85388 isolated_file:0
Jul 16 00:01:17 [kernel]  unevictable:0 dirty:1859 writeback:0
Jul 16 00:01:17 [kernel]  slab_reclaimable:10769 slab_unreclaimable:5518
Jul 16 00:01:17 [kernel]  mapped:98665 shmem:16080 pagetables:4562
Jul 16 00:01:17 [kernel]  sec_pagetables:0 bounce:0
Jul 16 00:01:17 [kernel]  kernel_misc_reclaimable:0
Jul 16 00:01:17 [kernel]  free:594841 free_pcp:445 free_cma:0
Jul 16 00:01:17 [kernel] Node 0 active_anon:5208kB inactive_anon:754100kB active_file:409100kB inactive_file:341552kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:394660kB dirty:7436kB writeback:0kB shmem:64320kB writeback_tmp:0kB kernel_stack:4232kB pagetables:18248kB sec_pagetables:0kB all_unreclaimable? no
Jul 16 00:01:17 [kernel] DMA free:5848kB boost:0kB min:24kB low:28kB high:32kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:5864kB mlocked:0kB bounce:0kB free_pcp:16kB local_pcp:8kB free_cma:0kB
Jul 16 00:01:17 [kernel] lowmem_reserve[]: 0 710 3920 3920
Jul 16 00:01:17 [kernel] Normal free:536672kB boost:0kB min:3396kB low:4244kB high:5092kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:131576kB inactive_file:1812kB unevictable:0kB writepending:64kB present:890272kB managed:754452kB mlocked:0kB bounce:0kB free_pcp:1764kB local_pcp:1376kB free_cma:0kB
Jul 16 00:01:17 [kernel] lowmem_reserve[]: 0 0 25680 25680
Jul 16 00:01:17 [kernel] DMA: 4*4kB (M) 3*8kB (M) 1*16kB (M) 3*32kB (M) 1*64kB (M) 2*128kB (M) 1*256kB (M) 0*512kB 1*1024kB (M) 0*2048kB 1*4096kB (M) = 5848kB
Jul 16 00:01:17 [kernel] Normal: 16*4kB (UME) 2*8kB (E) 15*16kB (UE) 21*32kB (UME) 8*64kB (UE) 1*128kB (E) 2*256kB (ME) 2*512kB (UM) 1*1024kB (E) 2*2048kB (UM) 129*4096kB (M) = 536672kB
Jul 16 00:01:17 [kernel] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jul 16 00:01:17 [kernel] 203743 total pagecache pages
Jul 16 00:01:17 [kernel] 0 pages in swap cache
Jul 16 00:01:17 [kernel] Free swap  = 8505340kB
Jul 16 00:01:17 [kernel] Total swap = 8505340kB
Jul 16 00:01:17 [kernel] 1048328 pages RAM
Jul 16 00:01:17 [kernel] 821762 pages HighMem/MovableOnly
Jul 16 00:01:17 [kernel] 36487 pages reserved
Jul 16 00:01:21 [kernel] alloc_vmap_area: 84 callbacks suppressed
Jul 16 00:01:21 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:01:30 [kernel] alloc_vmap_area: 132 callbacks suppressed
Jul 16 00:01:30 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:01:35 [kernel] alloc_vmap_area: 272 callbacks suppressed
Jul 16 00:01:35 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:01:40 [kernel] alloc_vmap_area: 294 callbacks suppressed
Jul 16 00:01:40 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:01:50 [kernel] alloc_vmap_area: 83 callbacks suppressed
Jul 16 00:01:50 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:01:56 [kernel] alloc_vmap_area: 104 callbacks suppressed
Jul 16 00:01:56 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:02:01 [kernel] alloc_vmap_area: 108 callbacks suppressed
Jul 16 00:02:01 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:02:06 [kernel] alloc_vmap_area: 123 callbacks suppressed
Jul 16 00:02:06 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:02:12 [kernel] alloc_vmap_area: 197 callbacks suppressed
Jul 16 00:02:12 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:02:17 [kernel] alloc_vmap_area: 185 callbacks suppressed
Jul 16 00:02:17 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:02:23 [kernel] alloc_vmap_area: 173 callbacks suppressed
Jul 16 00:02:23 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
1*                - Last output repeated 9 times -
Jul 16 00:02:28 [kernel] alloc_vmap_area: 193 callbacks suppressed
Jul 16 00:02:28 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:02:33 [kernel] alloc_vmap_area: 229 callbacks suppressed
Jul 16 00:02:33 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 14 times -
Jul 16 00:02:56 [kernel] alloc_vmap_area: 35 callbacks suppressed
Jul 16 00:02:56 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:03:01 [kernel] alloc_vmap_area: 139 callbacks suppressed
Jul 16 00:03:01 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:03:06 [kernel] alloc_vmap_area: 119 callbacks suppressed
Jul 16 00:03:06 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:03:12 [kernel] alloc_vmap_area: 70 callbacks suppressed
Jul 16 00:03:12 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:03:18 [kernel] alloc_vmap_area: 221 callbacks suppressed
Jul 16 00:03:18 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:03:25 [kernel] alloc_vmap_area: 137 callbacks suppressed
Jul 16 00:03:25 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:03:38 [kernel] alloc_vmap_area: 64 callbacks suppressed
Jul 16 00:03:38 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:03:43 [kernel] alloc_vmap_area: 268 callbacks suppressed
Jul 16 00:03:43 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:03:48 [kernel] alloc_vmap_area: 214 callbacks suppressed
Jul 16 00:03:48 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:03:56 [kernel] alloc_vmap_area: 277 callbacks suppressed
Jul 16 00:03:56 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:04:02 [kernel] alloc_vmap_area: 236 callbacks suppressed
Jul 16 00:04:02 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:04:07 [kernel] alloc_vmap_area: 233 callbacks suppressed
Jul 16 00:04:07 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:04:15 [kernel] alloc_vmap_area: 94 callbacks suppressed
Jul 16 00:04:15 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:04:21 [kernel] alloc_vmap_area: 127 callbacks suppressed
Jul 16 00:04:21 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:04:26 [kernel] alloc_vmap_area: 235 callbacks suppressed
Jul 16 00:04:26 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 16 00:04:31 [kernel] alloc_vmap_area: 273 callbacks suppressed
Jul 16 00:04:31 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size


For someone that can read code better than me there are very many hint I think in this last crash-.

Then I will try to report all this to kernel.org. Considering like I said in previous post it happen exactly the same with a virtual machine with Void Linux 32 bit, this is a general problem. Not sure if virtualizaion could be involved.

Not that it is important, then if we say: "we support 32 bit, it should be supported seriously". BTW, I can't even try on real hardware, everything I have is brand new and old stuff are recycled.

Code:
Linux gentoox86vm 6.6.38-gentoo #1 SMP PREEMPT_DYNAMIC Sun Jul 14 18:39:08 CEST 2024 i686 Intel(R) Core(TM) i7-14700K GenuineIntel GNU/Linux


PS: I forgot, free just one moment before crash:

Code:
Mem:         4047364     1047944     2273640       81608      946064     2999420
Swap:        8505340           0     8505340


and /proc/meminfo:

Code:
cat /proc/meminfo
MemTotal:        4047364 kB
MemFree:         2278232 kB
MemAvailable:    3004828 kB
Buffers:          135188 kB
Cached:           767012 kB
SwapCached:            0 kB
Active:           423332 kB
Inactive:        1189996 kB
Active(anon):       3860 kB
Inactive(anon):   788568 kB
Active(file):     419472 kB
Inactive(file):   401428 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:       3287048 kB
HighFree:        1740112 kB
LowTotal:         760316 kB
LowFree:          538120 kB
SwapTotal:       8505340 kB
SwapFree:        8505340 kB
Dirty:               296 kB
Writeback:             0 kB
AnonPages:        711136 kB
Mapped:           400548 kB
Shmem:             81300 kB
KReclaimable:      44372 kB
Slab:              66728 kB
SReclaimable:      44372 kB
SUnreclaim:        22356 kB
KernelStack:        4152 kB
PageTables:        15988 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    10529020 kB
Committed_AS:    3223724 kB
VmallocTotal:     122880 kB
VmallocUsed:       10528 kB
VmallocChunk:          0 kB
Percpu:             1472 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:       15776 kB
DirectMap2M:      890880 kB


PS2: I volutary didn't up the vmalloc becouse it is not the solution: we can just call that some little mre juice before the dead, a not bugged kernel don't need that.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22649

PostPosted: Mon Jul 15, 2024 10:26 pm    Post subject: Reply with quote

I think pjp's point was that characterizing it as a "big" bug would require some evidence that it affects a widely used configuration. I have a >6.6.21 32-bit kernel running on real hardware and am not seeing this. So either it is workload dependent, and I am not running the bad workload, or it is hardware dependent, and my real hardware is not one of the "bad" configurations.
Back to top
View user's profile Send private message
piggy
n00b
n00b


Joined: 28 Sep 2015
Posts: 26

PostPosted: Mon Jul 15, 2024 11:31 pm    Post subject: Reply with quote

Hu wrote:
I think pjp's point was that characterizing it as a "big" bug would require some evidence that it affects a widely used configuration. I have a >6.6.21 32-bit kernel running on real hardware and am not seeing this. So either it is workload dependent, and I am not running the bad workload, or it is hardware dependent, and my real hardware is not one of the "bad" configurations.

Which version is yours exactly, so I will try it? do you have highmem enabled? How many RAM you got? I also need to try under virtualbox guest. In the meantime I download and installing long term support kernel 6.6.40 from kernel.org. We'll see. And it makes no sense virtual hardware and real hardware could be different.

I just rebooted and increased the vmalloc and as usual it works for more time, then I read some bad interaction in the log starting to limit the resources. One is this happened without anything apparent happened in the desktop:
Code:
Jul 16 01:16:39 [kernel] CPU: 0 PID: 16152 Comm: Web Content Not tainted 6.6.38-gentoo #1
Jul 16 01:16:39 [kernel] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B32.2305221830 05/22/2023
Jul 16 01:16:39 [kernel] Call Trace:
Jul 16 01:16:39 [kernel]  dump_stack_lvl+0x32/0x41
Jul 16 01:16:39 [kernel]  dump_stack+0xd/0x10
Jul 16 01:16:39 [kernel]  warn_alloc+0xab/0x111
Jul 16 01:16:39 [kernel]  __vmalloc_node_range+0x73/0x345
Jul 16 01:16:39 [kernel]  __vmalloc_node+0x55/0x5d
Jul 16 01:16:39 [kernel]  ? bpf_prog_alloc_no_stats+0x1f/0xcd
Jul 16 01:16:39 [kernel]  __vmalloc+0x14/0x16
Jul 16 01:16:39 [kernel]  ? bpf_prog_alloc_no_stats+0x1f/0xcd
Jul 16 01:16:39 [kernel]  bpf_prog_alloc_no_stats+0x1f/0xcd
Jul 16 01:16:39 [kernel]  bpf_prog_alloc+0x13/0x9f
Jul 16 01:16:39 [kernel]  bpf_prog_create_from_user+0x47/0xbd
Jul 16 01:16:39 [kernel]  ? kprobe_free_init_mem+0x4c/0x4c
Jul 16 01:16:39 [kernel]  do_seccomp+0x176/0x7ac
Jul 16 01:16:39 [kernel]  ? __ia32_sys_prctl+0x47/0x5bf
Jul 16 01:16:39 [kernel]  __ia32_sys_seccomp+0x10/0x12
Jul 16 01:16:39 [kernel]  ia32_sys_call+0xd09/0x1063
Jul 16 01:16:39 [kernel]  __do_fast_syscall_32+0x7a/0x99
Jul 16 01:16:39 [kernel]  do_fast_syscall_32+0x29/0x5b
Jul 16 01:16:39 [kernel]  do_SYSENTER_32+0x15/0x17
Jul 16 01:16:39 [kernel]  entry_SYSENTER_32+0x98/0xf8
Jul 16 01:16:39 [kernel] EIP: 0xb7f3e56d
Jul 16 01:16:39 [kernel] Code: c4 01 10 03 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
Jul 16 01:16:39 [kernel] EAX: ffffffda EBX: 00000001 ECX: 00000001 EDX: bfa0f5e4
Jul 16 01:16:39 [kernel] ESI: 00000000 EDI: b7f21ce0 EBP: 40000004 ESP: bfa0f41c
Jul 16 01:16:39 [kernel] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000202
Jul 16 01:16:39 [kernel] Mem-Info:
Jul 16 01:16:39 [kernel] active_anon:1931 inactive_anon:359616 isolated_anon:0
Jul 16 01:16:39 [kernel]  active_file:90346 inactive_file:250888 isolated_file:0
Jul 16 01:16:39 [kernel]  unevictable:0 dirty:20 writeback:0
Jul 16 01:16:39 [kernel]  slab_reclaimable:18945 slab_unreclaimable:7759
Jul 16 01:16:39 [kernel]  mapped:121050 shmem:31558 pagetables:6430
Jul 16 01:16:39 [kernel]  sec_pagetables:0 bounce:0
Jul 16 01:16:39 [kernel]  kernel_misc_reclaimable:0
Jul 16 01:16:39 [kernel]  free:257767 free_pcp:208 free_cma:0
Jul 16 01:16:39 [kernel] Node 0 active_anon:7724kB inactive_anon:1438464kB active_file:361384kB inactive_file:1003552kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:484200kB dirty:80kB writeback:0kB shmem:126232kB writeback_tmp:0kB kernel_stack:5688kB pagetables:25720kB sec_pagetables:0kB all_unreclaimable? no
Jul 16 01:16:39 [kernel] DMA free:1664kB boost:0kB min:44kB low:52kB high:60kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:1616kB inactive_file:280kB unevictable:0kB writepending:0kB present:15992kB managed:5864kB mlocked:0kB bounce:0kB free_pcp:8kB local_pcp:4kB free_cma:0kB
Jul 16 01:16:39 [kernel] lowmem_reserve[]: 0 191 3921 3921
Jul 16 01:16:39 [kernel] Normal free:10420kB boost:0kB min:1744kB low:2180kB high:2616kB reserved_highatomic:2048KB active_anon:0kB inactive_anon:0kB active_file:45204kB inactive_file:41192kB unevictable:0kB writepending:36kB present:357792kB managed:222356kB mlocked:0kB bounce:0kB free_pcp:824kB local_pcp:260kB free_cma:0kB


vmalloc now:
Code:
cat /proc/meminfo | grep alloc
VmallocTotal:     655360 kB
VmallocUsed:       10636 kB
VmallocChunk:          0 kB


Hey, I want to say again that if I run 6.6.21 or less EVERYTHING is perfect with machine uptime unlimited, very solid, no error of any type. So something should be.
Back to top
View user's profile Send private message
piggy
n00b
n00b


Joined: 28 Sep 2015
Posts: 26

PostPosted: Wed Jul 17, 2024 12:25 am    Post subject: Reply with quote

piggy wrote:
Hu wrote:
I think pjp's point was that characterizing it as a "big" bug would require some evidence that it affects a widely used configuration. I have a >6.6.21 32-bit kernel running on real hardware and am not seeing this. So either it is workload dependent, and I am not running the bad workload, or it is hardware dependent, and my real hardware is not one of the "bad" configurations.

Well, well, well, I know you guys say; "who cares". We do not use an x32 real system from ages (but Hu!) so who cares about this bug, now not in so little widely configurations.

I didn't try Hu kernel (he kept secret about which one he runs on his real hardware machine), then I tested on another machine a Gentoo virtual guest under Virtualbox and it happen exactly the same.

Code:
Linux gentoox86vb 6.6.38-gentoo #2 SMP PREEMPT_DYNAMIC Tue Jul 18:18:25 CEST 2024 i686 Intel(R) Core(TM) i7-6700HQ CPU @ 2.60Hz GenuineIntel GNU/Linux


Other specs of this Virtualbox guest are pretty much similar to the other under vmnware. Tested with my config and with kernel-bin one, just to be sure. I have a .config tailored by me for any type of my virtual guests (and real machines BTW too, even if on real hardware considering I only run very recent hardware I prefer Windows 11 Pro.

Kernel-6.6.21-gentoo is the last perfectly working kernel. 6.6.38 is the current stable in Gentoo and so I tested this one and it act exactly like it act under VMware guest on the other machine.

In both envionment I also tested long term kernel.org 6.6.40 just to be sure it wasn't a maintainer inducted problem. And it isn't: kernel.org kernel act exactly the same of the others, and now I can even count the threads I need to open to have them start to die.

Now I don't have a real machine, but I could dare it happen on real too.

For now I can just contiue to test old kernels. I miss from 6.6.22 to 6.6.30 and this is where the bad deal happen and I'm curious to know. With my i7-14700K machine slightly overclocked it takes 10 minutes to build the 32 bit kernel and 3 minutes to setup and install, even with just 2 dedicated cpu core.

Just... I don't know where to get 6.6.22 to 6.6.29 becouse they are no more on the web gentoo packages page. Can someone help this volunteer of something of no interest for the majority of the users?

I also can use the ones in kernel.org but... last time I had no time so I fastly just downloaded 6.6.40 from the website home page. Browsing the files on the repositoruy I wasn't able to find the source, just the changelog.

It was probably ten years I didn't browse kernel.org.

If someone can help to find those old sources I can install them and see which one is the first to fail.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20485

PostPosted: Wed Jul 17, 2024 3:37 am    Post subject: Reply with quote

piggy wrote:
piggy wrote:
Hu wrote:
I think pjp's point was that characterizing it as a "big" bug would require some evidence that it affects a widely used configuration. I have a >6.6.21 32-bit kernel running on real hardware and am not seeing this. So either it is workload dependent, and I am not running the bad workload, or it is hardware dependent, and my real hardware is not one of the "bad" configurations.

Well, well, well, I know you guys say; "who cares".
I don't think anyone has said that. I've only suggested that it hasn't yet been demonstrated to be a bug and could be a configuration issue. Bisecting can help identify if it is a bug. Anyway,..

piggy wrote:
I don't know where to get 6.6.22 to 6.6.29 becouse they are no more on the web gentoo packages page. Can someone help this volunteer of something of no interest for the majority of the users?
There may be a better way, but until someone else offers it, this seems to work:

From packages.gentoo.org, you can search for gentoo-sources. That page has a link to the Git Log (short):
https://gitweb.gentoo.org/repo/gentoo.git/log/sys-kernel/gentoo-sources

Note that both "Git Log" and "short" are links. The above link is to the "short" form, which I find easier to use.

When you find a kernel version you want, it will lead you to a page like this, which is specifically for the commit that "drop"ed the ebuild from the tree:
https://gitweb.gentoo.org/repo/gentoo.git/commit/sys-kernel/gentoo-sources?id=321e98b7b0763ecec82b4ec6afd43c54fe70dcda

That page includes a link next to "--- a" (that is, the path and ebuild are a link):
Code:
--- a/sys-kernel/gentoo-sources/gentoo-sources-6.6.22.ebuild
https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-kernel/gentoo-sources/gentoo-sources-6.6.22.ebuild?id=92fbf69083c99b639e02bd4f0bcac0c9dd8593fd

From there, you can download the file from the "plain" link:
https://gitweb.gentoo.org/repo/gentoo.git/plain/sys-kernel/gentoo-sources/gentoo-sources-6.6.22.ebuild?id=92fbf69083c99b639e02bd4f0bcac0c9dd8593fd

If you don't already know, you'll need to put that in a local repository and create a manifest for it.
_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
piggy
n00b
n00b


Joined: 28 Sep 2015
Posts: 26

PostPosted: Wed Jul 17, 2024 9:46 am    Post subject: Reply with quote

pjp wrote:
piggy wrote:
piggy wrote:
Hu wrote:
I think pjp's point was that characterizing it as a "big" bug would require some evidence that it affects a widely used configuration. I have a >6.6.21 32-bit kernel running on real hardware and am not seeing this. So either it is workload dependent, and I am not running the bad workload, or it is hardware dependent, and my real hardware is not one of the "bad" configurations.

Well, well, well, I know you guys say; "who cares".
I don't think anyone has said that. I've only suggested that it hasn't yet been demonstrated to be a bug and could be a configuration issue. Bisecting can help identify if it is a bug. Anyway,..

piggy wrote:
I don't know where to get 6.6.22 to 6.6.29 becouse they are no more on the web gentoo packages page. Can someone help this volunteer of something of no interest for the majority of the users?
There may be a better way, but until someone else offers it, this seems to work:

From packages.gentoo.org, you can search for gentoo-sources. That page has a link to the Git Log (short):
https://gitweb.gentoo.org/repo/gentoo.git/log/sys-kernel/gentoo-sources

Thankx for explaining! Smething more simple al less time consuming no? Why they take it off from the offcial packages? I don't think it is a mass storage space problem this days. To have all the still recent 6.6.xx branch it would be the smart solution.

Related with bisettings or so:I read the link, and I admit it is so complicated and time consuming I istantly said: No Thankx! :D

In the meantime, another machine and another virtual 32 bit with another very recent cpu: the bug is present! Now we are with: two platforms (VMware and Virtualbox) and three different hosts and they all have the problem. Too bad I recycle and I'm horrified by old hardware, I dare that Hu is probably the one on earth with a >6.6.29 kernel and no problem on real hardware. Or maybe me (on virtual) and him (on real hardware) are the last two Linux 32 bit users on earth.
Back to top
View user's profile Send private message
piggy
n00b
n00b


Joined: 28 Sep 2015
Posts: 26

PostPosted: Wed Jul 17, 2024 9:54 am    Post subject: Reply with quote

piggy wrote:
pjp wrote:
piggy wrote:
piggy wrote:
Hu wrote:
I think pjp's point was that characterizing it as a "big" bug would require some evidence that it affects a widely used configuration. I have a >6.6.21 32-bit kernel running on real hardware and am not seeing this. So either it is workload dependent, and I am not running the bad workload, or it is hardware dependent, and my real hardware is not one of the "bad" configurations.

Well, well, well, I know you guys say; "who cares".
I don't think anyone has said that. I've only suggested that it hasn't yet been demonstrated to be a bug and could be a configuration issue. Bisecting can help identify if it is a bug. Anyway,..

piggy wrote:
I don't know where to get 6.6.22 to 6.6.29 becouse they are no more on the web gentoo packages page. Can someone help this volunteer of something of no interest for the majority of the users?
There may be a better way, but until someone else offers it, this seems to work:

From packages.gentoo.org, you can search for gentoo-sources. That page has a link to the Git Log (short):
https://gitweb.gentoo.org/repo/gentoo.git/log/sys-kernel/gentoo-sources

Thankx for explaining! Smething more simple al less time consuming no? Why they take it off from the offcial packages? I don't think it is a mass storage space problem this days. To have all the still recent 6.6.xx branch it would be the smart solution.

Related with bisettings or so:I read the link, and I admit it is so complicated and time consuming I istantly said: No Thankx! :D

In the meantime, another machine and another virtual 32 bit with another very recent cpu: the bug is present! Now we are with: two platforms (VMware and Virtualbox) and three different hosts and they all have the problem. Too bad I recycle and I'm horrified by old hardware, I dare that Hu is probably the one on earth with a >6.6.29 kernel and no problem on real hardware. Or maybe me (on virtual) and him (on real hardware) are the last two Linux 32 bit users on earth.


PS: I forgot: if it is a misconfiguration and not a bug, it will be a bug in autoconfiguration, becouse I simply migrate my WORK <6,6,22 kernel config to the >6.6.29 not working kernel. Not only, I also experimented with some IOMMU and generic mem config and it doesn't change nothing. Als the pristine .config on kerel-bin series of kernels has the same results: not work at the end. Then I stopped to experiment even if it takes little minutes on a i7 14th generation virtual machine becouse it makes no sense: if <6.6.22 iworks out of the box, >6.6.29 should do the same.
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 2420

PostPosted: Wed Jul 17, 2024 10:38 am    Post subject: Reply with quote

piggy wrote:
Smething more simple al less time consuming no?


Go to kernel.org and download what you want.

Best Regards,
Georgi
Back to top
View user's profile Send private message
piggy
n00b
n00b


Joined: 28 Sep 2015
Posts: 26

PostPosted: Wed Jul 17, 2024 2:26 pm    Post subject: Reply with quote

logrusx wrote:
piggy wrote:
Smething more simple al less time consuming no?


Go to kernel.org and download what you want.


Yes is what I did. And it bring back old memory (I was so young and I was try to learn some kernel hack). I didn't remeber all those patch-xxx to apply! Then it is like when you learn to play the guitar: you never forget.

BTW, even if I wanted to follow the gentoo way, it is not possible becouse Gentoo git is bugged too somewhere :oops:

Code:
Error 503 Response object too large

Response object too large
Error 54113

Details: cache-lin1730065-LIN 1721226157 1375299816

Varnish cache server


This while try to download the old kernel source like explained by pjp

In the meantime patched 6.6.24 finish to build, I did a try to 6.6.10 from kernel.org (it wasn't on my gentoo mirror, so I build the kernel.org one) and it has the same problem of any other kernel >6.6.29.

PS: I hope 6.6.24 is NOT bugged: if it is I need that ptach -R thing I don't do from million years ago! 8)
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20485

PostPosted: Wed Jul 17, 2024 5:14 pm    Post subject: Reply with quote

logrusx wrote:
Go to kernel.org and download what you want.
Does that include and apply patches? Describing how I find an old ebuild is more complicated than actually finding one. I already have a local repository, so adding a category/package isn't difficult. The only difficult part I've experienced is if I need to find anything that was also removed from files/. For me, going upstream is only going to make the process more difficult, not less.
_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
sublogic
Apprentice
Apprentice


Joined: 21 Mar 2022
Posts: 269
Location: Pennsylvania, USA

PostPosted: Wed Jul 17, 2024 10:44 pm    Post subject: Reply with quote

piggy wrote:
I dare that Hu is probably the one on earth with a >6.6.29 kernel and no problem on real hardware. Or maybe me (on virtual) and him (on real hardware) are the last two Linux 32 bit users on earth.

Nope. I run 6.6.38 on a 2007-vintage laptop.
Code:
$ emerge --info | head -n 5
Portage 3.0.65 (python 3.12.3-final-0, default/linux/x86/23.0/i686/split-usr/desktop, gcc-13, glibc-2.39-r9, 6.6.38-gentoo-x86 i686)
=================================================================
System uname: Linux-6.6.38-gentoo-x86-i686-Intel-R-_Core-TM-_Duo_CPU_T2250_@_1.73GHz-with-glibc2.39
KiB Mem:      951584 total,    623064 free
KiB Swap:    2097148 total,   1983740 free
I do use an x86_64 distcc helper/chroot builder for faster updates. Gentoo support is outstanding. Some upstream support is beginning to fade --but they take patches.

The <1GB RAM is where it hurts when running contemporary bloatware. Firefox with more than 3 tabs, say.
Back to top
View user's profile Send private message
piggy
n00b
n00b


Joined: 28 Sep 2015
Posts: 26

PostPosted: Thu Jul 18, 2024 10:29 am    Post subject: Reply with quote

sublogic wrote:
piggy wrote:
I dare that Hu is probably the one on earth with a >6.6.29 kernel and no problem on real hardware. Or maybe me (on virtual) and him (on real hardware) are the last two Linux 32 bit users on earth.

Nope. I run 6.6.38 on a 2007-vintage laptop.
Code:
$ emerge --info | head -n 5
Portage 3.0.65 (python 3.12.3-final-0, default/linux/x86/23.0/i686/split-usr/desktop, gcc-13, glibc-2.39-r9, 6.6.38-gentoo-x86 i686)
=================================================================
System uname: Linux-6.6.38-gentoo-x86-i686-Intel-R-_Core-TM-_Duo_CPU_T2250_@_1.73GHz-with-glibc2.39
KiB Mem:      951584 total,    623064 free
KiB Swap:    2097148 total,   1983740 free
I do use an x86_64 distcc helper/chroot builder for faster updates. Gentoo support is outstanding. Some upstream support is beginning to fade --but they take patches.

The <1GB RAM is where it hurts when running contemporary bloatware. Firefox with more than 3 tabs, say.

Can I call you a hero? I have to admit i could not have the patience to work on such vintage hardware and especially with just 1 GB ram.

I have to say, you probably are not affected by the bug, becouse the problem quietly sure happen in the highmem part of the ram. I still didn’t test it, I didn’t had time yesterday and today I’m a lot busy too.

For now I can confirm 6.6.22, so a step ahead from the original title of this thread where I originally do talk about 6.6.21 as the last working kernel, is NOT affected by the bug (UPDATED the title of the thread, I test all kernels on three different cpu platforms and 2 different virtual machine software before call it working or buggy). Now let’s try 6.6.23, from kernel.org, becouse gentoo git, like show before, doesn’t allow me to download big files from the server.
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 2420

PostPosted: Thu Jul 18, 2024 10:48 am    Post subject: Reply with quote

pjp wrote:
logrusx wrote:
Go to kernel.org and download what you want.
Does that include and apply patches?
...
For me, going upstream is only going to make the process more difficult, not less.


For this particular case it should be completely fine and the easiest way possible with almost no explanations.

Best Regards,
Georgi
Back to top
View user's profile Send private message
piggy
n00b
n00b


Joined: 28 Sep 2015
Posts: 26

PostPosted: Thu Jul 18, 2024 11:42 pm    Post subject: the end of the story: it's a BIG BIG BUG Reply with quote

logrusx wrote:
pjp wrote:
logrusx wrote:
Go to kernel.org and download what you want.
Does that include and apply patches?
...
For me, going upstream is only going to make the process more difficult, not less.


For this particular case it should be completely fine and the easiest way possible with almost no explanations.

Best Regards,
Georgi

Thank you Georgi. Using everything in kernel.org was easy enough to fast testing everything, kernel after kernel, and understand what the problem is.


THE END OF THE STORY: IT 'S AN INCREDIBLE BIG BUG (even in the case it just impact on VMware and Virtualbox guests)

First of all the first broken kernel is: linux-6.6.24 (well, maybe ladies and gentlem the first broken kernel is... :oops: )

Code:
Linux gentoox86vm 6.6.24 #1 SMP PREEMPT_DYNAMIC Fri Jul 19 00:57:10 CEST 2024 i686 Intel(R) Core(TM) i7-14700K GenuineIntel GNU/Linux


It start to die just after you load someghing after boot complaining about vmalloc, exactly what they claim to have fixed on x86 HP machines (the reason of the brken commit I isolated)

Code:
Jul 19 01:09:34 [kernel] lowmem_reserve[]: 0 0 25680 25680
Jul 19 01:09:34 [kernel] DMA: 4*4kB (M) 3*8kB (M) 2*16kB (M) 1*32kB (M) 2*64kB (M) 2*128kB (M) 2*256kB (M) 0*512kB 0*1024kB 0*2048kB 1*4096kB (M) = 5096kB
Jul 19 01:09:34 [kernel] Normal: 44*4kB (E) 63*8kB (UME) 51*16kB (UE) 18*32kB (UME) 14*64kB (UME) 5*128kB (U) 1*256kB (E) 3*512kB (UME) 0*1024kB 1*2048kB (U) 130*4096kB (M) = 539928kB
Jul 19 01:09:34 [kernel] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jul 19 01:09:34 [kernel] 207032 total pagecache pages
Jul 19 01:09:34 [kernel] 0 pages in swap cache
Jul 19 01:09:34 [kernel] Free swap  = 8505340kB
Jul 19 01:09:34 [kernel] Total swap = 8505340kB
Jul 19 01:09:34 [kernel] 1048328 pages RAM
Jul 19 01:09:34 [kernel] 821762 pages HighMem/MovableOnly
Jul 19 01:09:34 [kernel] 36685 pages reserved
Jul 19 01:09:36 [kernel] alloc_vmap_area: 96 callbacks suppressed
Jul 19 01:09:36 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 19 01:09:41 [kernel] alloc_vmap_area: 191 callbacks suppressed
Jul 19 01:09:41 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
                - Last output repeated 9 times -
Jul 19 01:09:46 [kernel] alloc_vmap_area: 261 callbacks suppressed
Jul 19 01:09:46 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size


IMHO, looking at the changelog of linux-6.6.24, the commit that broken everything (I say just on virtual platform just becouse I don't have a real hardware machine to test, then I'm sure it is broken everywhere):

Code:
commit 9a98ab01e3acba830cb0917296a13192fd23f305
Author: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Date:   Mon Nov 13 12:07:39 2023 -0800

    platform/x86: hp-bioscfg: Fix error handling in hp_add_other_attributes()
   
    commit f40f939917b2b4cbf18450096c0ce1c58ed59fae upstream.
   
    'attr_name_kobj' is allocated using kzalloc, but on all the error paths
    it is not freed, hence we have a memory leak.
   
    Fix the error path before kobject_init_and_add() by adding kfree().
   
    kobject_put() must be always called after passing the object to
    kobject_init_and_add(). Only the error path which is immediately next
    to kobject_init_and_add() calls kobject_put() and not any other error
    path after it.
   
    Fix the error handling after kobject_init_and_add() by moving the
    kobject_put() into the goto label err_other_attr_init that is already
    used by all the error paths after kobject_init_and_add().
   
    Fixes: a34fc329b189 ("platform/x86: hp-bioscfg: bioscfg")
    Cc: stable@vger.kernel.org # 6.6.x: c5dbf0416000: platform/x86: hp-bioscfg: Simplify return check in hp_add_other_attributes()
    Cc: stable@vger.kernel.org # 6.6.x: 5736aa9537c9: platform/x86: hp-bioscfg: move mutex_lock() down in hp_add_other_attributes()
    Reported-by: kernel test robot <lkp@intel.com>
    Reported-by: Dan Carpenter <error27@gmail.com>
    Closes: https://lore.kernel.org/r/202309201412.on0VXJGo-lkp@intel.com/
    Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
    [ij: Added the stable dep tags]
    Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Link: https://lore.kernel.org/r/20231113200742.3593548-3-harshit.m.mogalapalli@oracle.com
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


This commit was never reverted so everything after linux-6.6.23 starting from linux-6.6.24 to linux-6.10 is broken on virtual guests on VMware and Virtualbox (for sure). I think it is broken on real hardware too (but HP machines?). A virtual machine is like a real machine.

How to replicate the bug: you need a machine with an x86 processor, x86 Gentoo distro (or any other distro, (I tried also with Void and it is the same compared to gentoo, I loaded the same kernel built for Gentoo), 4 GB ram (or more, or even less also If I didn't try), probably everything over 1024 MB of RAM, so using highmen (machines with 1 GB of ram are not affected).

At this point the system seems to work as expected as long as you do not use more than 1024 MB ram. After that you start get vmalloc alerts in the log (metalog is especially accurate) and slowly the machine will not open apps anymore, it will crash browser tab, it will stop playing audio and so on, and if you go ahead it freeze the system (no explicit kernel exeption here).

This BIG BUG is reproducible everytime on virtual hardware, I bet it is on real hardware too (but HP?).

I opened this bug over kernel.org

https://bugzilla.kernel.org/show_bug.cgi?id=219061

and I also sent a notification to people related with that commitment.
Back to top
View user's profile Send private message
sublogic
Apprentice
Apprentice


Joined: 21 Mar 2022
Posts: 269
Location: Pennsylvania, USA

PostPosted: Sat Jul 20, 2024 1:47 am    Post subject: Reply with quote

@piggy, a couple of points.
  • my i686 kernel has CONFIG_HIGHMEM=y and CONFIG_HIGHMEM4G=y and yet I don't have the problem.
  • Since you run VM's, can you test programmatically if the kernel is good or bad ? If so, you can run an automated bisection with git. Slow but hands-free.
Your tests of released 6.6.x kernels is really coarse-grained, there are tons of commits between releases. If you use git you pinpoint the precise change that introduced the problem. Start with https://wiki.gentoo.org/wiki/Kernel_git-bisect . It's a gentoo learning experience *and* a kernel learning experience. The kernel devs appreciate a fine-grained diagnostic.

(As for my HIGHMEM despite having less than 1 GB, I vaguely remember dmesg warnings that I could only use ~784 MB or some such without it. Bah.)
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 2420

PostPosted: Sat Jul 20, 2024 5:19 pm    Post subject: Reply with quote

We actually didn't see your working config, also you have a response in LKML which essentially says you've got the wrong commit. They also ask you to test 6.6.40 and 6.10 if I remember the version correctly.

Best Regards,
Georgi
Back to top
View user's profile Send private message
piggy
n00b
n00b


Joined: 28 Sep 2015
Posts: 26

PostPosted: Sun Jul 21, 2024 1:55 pm    Post subject: Reply with quote

logrusx wrote:
We actually didn't see your working config, also you have a response in LKML which essentially says you've got the wrong commit. They also ask you to test 6.6.40 and 6.10 if I remember the version correctly.

Best Regards,
Georgi

Hi Georgi, the working .config is the same of the "not working" .config. :P

As I explained in Bugzilla, the problem is NOT the .config, the problem is everything is ok =<6.6.23 and everything is broken =>6.6.24 using the posted .config both with =<6.6.23 work and =>6.6.24 not work.

So you get the .config I posted on Bugzilla and you get my .config.

Related with the LKML: if they not put me in CC when they ask me something, or if they don't ask me ask me through Bugzilla, I can't read.

I'm not a kernel developer and I'm not interested at all in getting thousands of messages of no interest for me every day subcribing the LKML.

Then if I write clearly in the Bugzilla what I repeated here (everything broken =>6.6.24) there is nothing more to try. If you read some posts before in this thread, you also will read I explicitly tried 6.6.40 from kernel.org.


Last edited by piggy on Sun Jul 21, 2024 2:43 pm; edited 2 times in total
Back to top
View user's profile Send private message
piggy
n00b
n00b


Joined: 28 Sep 2015
Posts: 26

PostPosted: Sun Jul 21, 2024 2:18 pm    Post subject: Reply with quote

sublogic wrote:
@piggy, a couple of points.[list][*]my i686 kernel has CONFIG_HIGHMEM=y and CONFIG_HIGHMEM4G=y and yet I don't have the problem.


Someone always said this here, then explicit requests of more details were never satisfied. Just out of curiosity, which version of the kernel? Virtual or real machine? Phisycal memory installed in the machine?

Quote:
[*]Since you run VM's, can you test programmatically if the kernel is good or bad ?


I did it, read below.

Quote:
If so, you can run an automated bisection


I'm not a kernel developer, I'm not interested in become one, and that procedure is long and I don't really have time for it. I spent some time to isolate where the problem is: now I passed the ball to developers :arrow: , if they can and want. The bug is there, so big you can't not see it. If it is needed to fix it, good for all of us users, me in first as a i386 virtual machine clieent user, if not the kernel will remain bugged then it will impact very few people, becouse who really run i386 real machines for real this days? I wont do that for sure even under torture :P . 8)

Quote:
(As for my HIGHMEM despite having less than 1 GB, I vaguely remember dmesg warnings that I could only use ~784 MB or some such without it. Bah.)

If I disable highmen I can use all my virtual machine basic first 1024 MB and if memory serve me well, I have the rest of 3 GB configured as buffers/cache (sure about the first 1024 GB, not about the last 3 and I don't have time to test and not interested in it). Not only, even with highmem enabled, the bug start when the system cross the line of the 1024 MB and it start to spit the vmalloc errors and slowly the machine die. It's sure if you boot the machine, open the terminal and I say some very light weight browser with one tab you need time to get the bug. You meed to use the machine to get it (like three or four terminals, Firefox with 10 tabs, an audio player and so on. Nothing special, but not a very very very light weight use. I also think who tested the 32 bit versions of =>6.6.24 limited them test to boot and run a pair of terminals, a web browser and then shutdown. You can't reach the bug like this.

What I could do about this real and very nasty problem I did; the rest is not in my interest and I have no time (but eventually test some temporary patches).
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum