[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Xen PANIC in MCE interrupt context : can global variable dom0 be NULL ?



Hi ,


I am using Xen 3.4.1 - I see that sometimes when an MCE error occurs Xen panics due to a page fault with the following stack trace-

http://pastebin.com/f30f67342

 After some digging, probable culprit seems to be smp_cmci_interrupt

if (bs.errcnt && mctc != NULL) {
if (guest_enabled_event(dom0->vcpu[0], <------------------------------------ here
                     VIRQ_MCA)) {
            mctelem_commit(mctc);
            printk(KERN_DEBUG "CMCI: send CMCI to DOM0 through virq\n");
            send_guest_global_virq(dom0, VIRQ_MCA);
        } else {
            x86_mcinfo_dump(mctelem_dataptr(mctc));
            mctelem_dismiss(mctc);
       }


Looks like dom0 is NULL here ( vcpu[0] offset is 0x468). Is this possible?

Other functions like mce_softirq() perform a NULL check on dom0 before accessing it's members ....
/* Step2: Send Log to DOM0 through vIRQ */
        if (dom0 && guest_enabled_event(dom0->vcpu[0], VIRQ_MCA)) {
            printk(KERN_DEBUG "MCE: send MCE# to DOM0 through virq\n");
            send_guest_global_virq(dom0, VIRQ_MCA);
        }

Also note that, this system printed the MCE warning message( "(XEN) MCE: The hardware reports a non fatal, correctable incident occured on CPU 0" ) twice before panicing.

So this code worked properly and entered x86_mcinfo_dump() atleast twice before panic.


- Regards,
Ashwin



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.