An Unrecoverable System Error Nmi Has Occurred


We have a ceph cluster with 3 hosts, 3 monitors up and running on this lab and erverything seems to be quite good. And what about non-corosync configurations?

The system runs SLES 11 with sp2. Can you test if, with no running watchdog-mux, the watchdog works? If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'. HP is trying to figure out what is generating the NMIs with intel_idle but it might be the case to recommend all HP servers to deactivate intel_idle module (in a near https://access.redhat.com/solutions/1309033

If the problem is solved, change the tag 'verification-needed-precise' to 'verification-done-precise'. tags: added: verification-needed-precise tags: added: verification-needed-trusty Brad Figg (brad-figg) wrote on 2015-03-26: #9 This bug is awaiting verification that the kernel in -proposed solves the problem. Unfortuantly these have not been kept in sync with the kernel leading to the module loading. """ This is actually not a resolution for this particular case, but a bug (from

ILO: "76 CriticalSystem Error03/12/2015 12:4203/12/2015 12:072 An Unrecoverable System Error (NMI) has occurred (System error code 0x0000002B, 0x00000000)" Examples: PID: 0 TASK: ffffffff81c1a480 CPU: 0 COMMAND: "swapper/0"  #0 [ffff88085fc05c88] machine_kexec at As described in /etc/modprobe.d/blacklist-watchdog.conf: """ # Watchdog drivers should not be loaded automatically, but only if a # watchdog daemon is installed. """ We should blacklist module "hpwdt" by default for An Unrecoverable System Error (nmi) Has Occurred (service Information: 0x7fbce8f6, 0x00000000) Just to provide feedback on the cmdline and its explanations.

Brad Figg (brad-figg) on 2015-03-18 Changed in linux (Ubuntu Utopic): status: In Progress → Fix Committed Changed in linux (Ubuntu Trusty): status: In Progress → Fix Committed Changed in linux (Ubuntu An Unrecoverable System Error Nmi Has Occurred Dl585 If you need to reset your password, click here. Thought it was only related to 3.x kernels. https://community.hpe.com/t5/ProLiant-Servers-ML-DL-SL/DL380p-Gen8-with-uncorrectabl-PCI-express-error/td-p/5995669 By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features.

Code: edit: /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="nmi_watchdog=0" #update-grub #reboot #20 aderumier, Nov 20, 2015 Last edited: Nov 20, 2015 (You must log in or sign up to post here.) Show Ignored Content Page Ilo Watchdog Nmi but it's a bit different, you are right. #14 pipomambo, Nov 11, 2015 adamb Member Proxmox VE Subscriber Joined: Mar 1, 2012 Messages: 777 Likes Received: 3 pipomambo said: ↑ We have backported the fix to Ubuntu-3.13.0-35.61. In addition, I think there is a second problem here.

We Acted. Open Source Communities Comments Helpful 15 Follow A few HP Gen8 and Gen9 systems are crashing due to NMI. An Unrecoverable System Error Nmi Has Occurred Hp The issue occurs most often when we use live migration. An Unrecoverable System Error (nmi) Has Occurred Proliant We have updated drivers and FW of system board, replaced system board and riser board, yet we still get the same failures after some days. 0 Kudos Reply PMI_WINCHAM Occasional Visitor

Canonical has provided a kernel patch to "workaround" the issue in non-patched firmware (yet to be released by HP probably). - - X2APIC support for HP Proliant Servers + - X2APIC http://svbuckeye.com/an-unrecoverable/an-unrecoverable-system-error-nmi-has-occurred-system-error-code.php Reason: Added link to the HP forum Ser Olmy View Public Profile View LQ Blog View Review Entries View HCL Entries Find More Posts by Ser Olmy 06-02-2014, 06:33 AM But you can solve doing this: the modules what produces this is hpwdt. For this reason we already blacklist "all" of these modules in kmod/module-in-tools blacklists. An Unrecoverable System Error Has Occurred Error Code 0x0000002d 0x00000000

This issue exists when your server runs out of memory and have much I/O load at the same time. Status: RESOLVED INVALID Whiteboard: [id=nagios1.private.phx1.mozilla.com:... Newer Than: Search this thread only Search this forum only Display results as threads More... http://svbuckeye.com/an-unrecoverable/an-unrecoverable-system-error-has-occurred.php Depending on your system the reason for the NMI is logged in any one of the following resources: 1.

They can check the PCI Bus and which device/ system component is using the Bus 0, Device 2 -----------------I am an HP employee.Was this post useful? - You may click the Ilo Application Watchdog Timeout Nmi Service Information 0x0000002b 0x00000000 We can start VM's, also migrate them but as soon you activate HA for any VM we receive a kernel panic on the hhwdt.ko module. I agree, I will dig into that to. #9 adamb, Oct 21, 2015 adamb Member Proxmox VE Subscriber Joined: Mar 1, 2012 Messages: 777 Likes Received: 3 I wanted to

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.

However, I found that the cause is my VM and the large amount of RAM I have assigned. Note You need to log in before you can comment on or make changes to this bug. The time now is 05:34 AM. Uncorrectable Pci Express Error The reason that I have not installed the new version is that when I installed in one of them the system them did not recognize the Emulex cards.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1432837 But, It could be a problem of ilo configuration when watchdog is enable by hpwdt. Thank you!!! proxmox-ve: 4.0-16 (running kernel: 4.2.2-1-pve) pve-manager: 4.0-50 (running version: 4.0-50/d3a6b7e5) pve-kernel-4.2.2-1-pve: 4.2.2-16 lvm2: 2.02.116-pve1 corosync-pve: 2.3.5-1 libqb0: 0.17.2-1 pve-cluster: 4.0-23 qemu-server: 4.0-31 pve-firmware: 1.1-7 libpve-common-perl: 4.0-32 libpve-access-control: 4.0-9 libpve-storage-perl: 4.0-27 pve-libspice-server1: check my blog Are you new to LinuxQuestions.org?

This will tell OS to deactivate intel_idle and activate acpi_idle module, which gets c-state values to be used from the ACPI tables, given by firmware. So it is recommended that on all HP Proliant Servers Gen8, or newer, to use the following cmdline: " intremap=no_x2apic_optout ". There is no additional network card installed. This probably falls on HP first.