[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen



>>> Gavin Maltby <Gavin.Maltby@xxxxxxx> 17.09.08 06:17 >>>
>I don't see this as a problem for machine check correctness.
>
>If dom0 asks to offline a cpu (because it believes the cpu is busted and
>a threat to uptime), that decision is fundamentally asynchronous
>to the actual error handling that occured at machine check exception
>time:
>
>  - running in whatever context
>  - MCE occurs
>  - trap to hypervisor MCE handler
>       . this decides on hypervisor panic, or other appropriate
>         immediate (in handler) response
>       . telemetry forwarded to dom0 for logging and analysis
>  - assume no hypervisor panic
>  - eons pass during which any unconstrained bad data remaining
>    after initial handling may go anywhere
>  - dom0 gets telemetry and let's say diagnoses a fault and
>    decides to call back into the hypervisor to offline the
>    offending cpu
>
>Note the "eons pass" bit;  tonnes of instructions may run on the
>bad cpu in this time, and a few more for some offline delay won't
>hurt.

Shouldn't this possibly be handled the other way around: If a recoverable
MCE happened, immediately stop scheduling anything on the affected
CPU(s), until Dom0 tells you otherwise (and of course as long as there
remains at least one CPU to run on).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.