View previous topic :: View next topic |
Author |
Message |
P.Kosunen Guru

Joined: 21 Nov 2005 Posts: 309 Location: Finland
|
Posted: Mon Sep 04, 2017 8:09 am Post subject: CMCI storm detected: switching to poll mode |
|
|
Code: | Sep 2 18:03:54 shuttle kernel: [708926.214851] CMCI storm detected: switching to poll mode
Sep 2 18:03:54 shuttle kernel: [710011.679941] INFO: rcu_sched self-detected stall on CPU
Sep 2 18:03:54 shuttle kernel: [710011.679948] ^I0-...: (2 GPs behind) idle=cc6/140000000000001/0 softirq=10536764/10536766 fqs=0
Sep 2 18:03:54 shuttle kernel: [710011.679949] ^I (t=1294978 jiffies g=5107773 c=5107772 q=62979)
Sep 2 18:03:54 shuttle kernel: [710011.679952] rcu_sched kthread starved for 1294978 jiffies! g5107773 c5107772 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
Sep 2 18:03:54 shuttle kernel: [710011.679954] rcu_sched S15088 8 2 0x00000000
Sep 2 18:03:54 shuttle kernel: [710011.679959] Call Trace:
Sep 2 18:03:54 shuttle kernel: [710011.679968] ? __schedule+0x1ef/0x430
Sep 2 18:03:54 shuttle kernel: [710011.679970] ? schedule+0x2d/0x80
Sep 2 18:03:54 shuttle kernel: [710011.679971] ? schedule_timeout+0xf3/0x170
Sep 2 18:03:54 shuttle kernel: [710011.679975] ? mod_timer+0x180/0x180
Sep 2 18:03:54 shuttle kernel: [710011.679977] ? rcu_accelerate_cbs+0x36/0x190
Sep 2 18:03:54 shuttle kernel: [710011.679978] ? rcu_gp_kthread+0x489/0x7b0
Sep 2 18:03:54 shuttle kernel: [710011.679981] ? prepare_to_swait_event+0x1a/0x40
Sep 2 18:03:54 shuttle kernel: [710011.679982] ? rcu_gp_kthread+0x489/0x7b0
Sep 2 18:03:54 shuttle kernel: [710011.679984] ? kthread+0xf2/0x130
Sep 2 18:03:54 shuttle kernel: [710011.679986] ? synchronize_rcu_expedited+0x10/0x10
Sep 2 18:03:54 shuttle kernel: [710011.679987] ? kthread_create_on_node+0x40/0x40
Sep 2 18:03:54 shuttle kernel: [710011.679989] ? ret_from_fork+0x22/0x30
Sep 2 18:03:54 shuttle kernel: [710011.679993] NMI backtrace for cpu 0
Sep 2 18:03:54 shuttle kernel: [710011.679996] CPU: 0 PID: 2491 Comm: cw_process Not tainted 4.12.5-gentoo #2
Sep 2 18:03:54 shuttle kernel: [710011.679997] Hardware name: Shuttle Inc. DX30D/FDX30, BIOS 1.02 02/15/2017
Sep 2 18:03:54 shuttle kernel: [710011.679997] Call Trace:
Sep 2 18:03:54 shuttle kernel: [710011.679998] <IRQ>
Sep 2 18:03:54 shuttle kernel: [710011.680002] ? dump_stack+0x46/0x61
Sep 2 18:03:54 shuttle kernel: [710011.680004] ? nmi_cpu_backtrace+0x8a/0x90
Sep 2 18:03:54 shuttle kernel: [710011.680006] ? irq_force_complete_move+0xe0/0xe0
Sep 2 18:03:54 shuttle kernel: [710011.680008] ? nmi_trigger_cpumask_backtrace+0x86/0xc0
Sep 2 18:03:54 shuttle kernel: [710011.680009] ? rcu_dump_cpu_stacks+0x88/0xc1
Sep 2 18:03:54 shuttle kernel: [710011.680011] ? rcu_check_callbacks+0x642/0x780
Sep 2 18:03:54 shuttle kernel: [710011.680013] ? update_wall_time+0x474/0x720
Sep 2 18:03:54 shuttle kernel: [710011.680015] ? update_process_times+0x23/0x50
Sep 2 18:03:54 shuttle kernel: [710011.680016] ? tick_sched_timer+0x3d/0x130
Sep 2 18:03:54 shuttle kernel: [710011.680018] ? __hrtimer_run_queues+0xb5/0x120
Sep 2 18:03:54 shuttle kernel: [710011.680019] ? hrtimer_interrupt+0x9d/0x1e0
Sep 2 18:03:54 shuttle kernel: [710011.680022] ? smp_trace_apic_timer_interrupt+0x59/0x90
Sep 2 18:03:54 shuttle kernel: [710011.680024] ? apic_timer_interrupt+0x7f/0x90
Sep 2 18:03:54 shuttle kernel: [710011.680024] </IRQ>
Sep 2 18:03:54 shuttle kernel: klogd 1.5.1, ---------- state change ----------
Sep 2 18:03:54 shuttle kernel: Loaded 57659 symbols from 13 modules.
Sep 2 18:03:54 shuttle kernel: [710011.682343] Hangcheck: hangcheck value past margin!
Sep 2 18:09:23 shuttle kernel: [710340.172989] CMCI storm subsided: switching to interrupt mode |
Got this error with new Shuttle XPC Slim DX30 computer with Intel Celeron J3355 CPU and Corsair 8GB memory kit (CMSO8GX3M2C1600C11). Is this incompatible or broken memory problem or something else? Clock was several hours wrong and couldn't reboot cleanly next morning. |
|
Back to top |
|
 |
eccerr0r Watchman

Joined: 01 Jul 2004 Posts: 9932 Location: almost Mile High in the USA
|
Posted: Mon Sep 04, 2017 4:55 pm Post subject: |
|
|
It's possible it's bad memory, also possible bad CPU. CMCI is usually a hardware problem, and likely you may have to RMA the machine... You may want to try other memory configurations, or perhaps muck with overclocking options to see if it will go away.
There's also a possibility of bad firmware that needs to be addressed. See if there's a firmware update.
Kernel is still a possibility but rare if it works on other machines. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
 |
P.Kosunen Guru

Joined: 21 Nov 2005 Posts: 309 Location: Finland
|
Posted: Thu Sep 07, 2017 4:44 pm Post subject: |
|
|
BIOS is latest. Some OS selection is set to Windows in UEFI/BIOS because it also controls UEFI vs. legacy BIOS switching.
I updated system and kernel to 4.13.0 and switched clocksource to hpet, no issues since. Might be too early to tell, but let's hope it was 4.12.5 kernel or other software problem.
Edit: Disabling intel_idle from kernel seems to be workaround for this problem. Need to test different intel_idle.max_cstate levels...
Edit2: Different machine with Celeron J3455 and Void Linux, CMCI storm does not happen with "processor.max_cstate=1 intel_idle.max_cstate=0" kernel boot options.
CMCI storms usually happen when copying data from local SSD to NAS at >100MB/s (full gigabit network load). Might not be faulty hardware because same issue is in 2 different boxes. |
|
Back to top |
|
 |
|