[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus



>>> On 13.06.12 at 12:49, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
> Jan Beulich wrote:
>>>>> On 13.06.12 at 10:05, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
>>> Xen vMCE bugfix: inject vMCE# to all vcpus
>>> 
>>> In our test for win8 guest mce, we find a bug in that no matter what
>>> SRAO/SRAR error xen inject to win8 guest, it always reboot.
>>> 
>>> The root cause is, current Xen vMCE logic inject vMCE# only to
>>> vcpu0, this is not correct for Intel MCE (Under Intel arch, h/w
>>> generate MCE# to all CPUs). 
>>> 
>>> This patch fix vMCE injection bug, injecting vMCE# to all vcpus.
>> 
>> I see no correlation between the fix (and its description) and the
>> problem at hand: Why would Win8 reboot if it doesn't receive a
>> particular MCE on all CPUs? Isn't that model specific behavior?
> 
> It's not model specific. For Intel processor, MCE# is broadcast to all 
> logical processors on the system on which the UCR errors are supported (refer 
> Intel SDM 15.9.3.1 & 15.9.3.2). So for vMCE injection under Intel arch, it's 
> a 
> bug if inject vMCE# only to vcpu0.

This is for a certain group of errors, but I can't find any statement
that this is general architectural behavior.

>> Furthermore I doubt that an MCE on one socket indeed causes
>> MCE-s on all other sockets, not to speak of distinct NUMA nodes
>> (it would already surprise me if MCE-s got broadcast across cores
>> within a socket, unless they are caused by a resource shared
>> across cores).
> 
> Somehow I agree. But at least currently from Intel SDM, architecturally it 
> would broadcast.

I can't even see how this would work reliably across packages or
nodes: Suppose two entities each encounter a UCR - they would
each signal the other one, and the handling of this signal would
need to be synchronized with the handling of the local UCR
(irrespective of the order in which the events arrive).

>>> +        if ( !test_and_set_bool(v->mce_pending) &&
>>> +            ((d->is_hvm) ? 1 :
>>> +            guest_has_trap_callback(d, v->vcpu_id,
>>> TRAP_machine_check)) ) 
>> 
>> Quite strange a way to say
>> 
>>             (d->is_hvm || guest_has_trap_callback(d, v->vcpu_id,
>> TRAP_machine_check)) 
> 
> For hvm it's OK to just set v->mce_pending. While for pv it need also check 
> if 
> guest callback has been registered.

Sure. I was just asking why you use a conditional operator when
the simpler || would do.

>>> +        {
>>> +            mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d
>>> vcpu%d\n", +                       d->domain_id, v->vcpu_id);
>>> +            vcpu_kick(v);
>>> +        }
>>> +        else
>>> +        {
>>> +            mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d
>>> vcpu%d\n", +                       d->domain_id, v->vcpu_id);
>>> +            return -1;
>> 
>> Why do you bail here? This is particularly bad if v->mce_pending
>> was already set on some vCPU (as that could simply mean the guest
>> just didn't get around to handle the vMCE yet).
> 
> the case v->mce_pending already set while new vmce come will result in domain 
> crash.
> I don't think guest can still handle this case, e.g. under baremetal if 2nd 
> mce occur while 1st mce have not been handled os will reboot directly.

Sorry, no. This goes back to the questionable nature of the
broadcasting above - if a CPU encounters a UCR itself and gets
signaled of a remote CPU having encountered one, it should be
able to cope. Otherwise the risk of (pointless!) system death
would increase with the number of CPUs.

>> Also, how does this whole change interact with vmce_{rd,wr}msr()?
>> The struct bank_entry instances live on a per-domain list, so the
>> vMCE being delivered to all vCPU-s means they will all race for the
>> single entry (and might erroneously access others, particularly in
>> vmce_wrmsr()'s MCG_STATUS handling).
> 
> Yes, I agree.
> However, recently we have done vMCE re-design work (ongoing some internal 
> review) and would present sooner. In new approach, MSRs are per-vcpu instead 
> of per-domain.  MSRs race issue (and the effort to make it clean) would be 
> pointless then. So at this butfix patch, I only touch vMCE# injection itself.

I understand that you want the simpler version as bug fix, but
you didn't answer whether it was at all verified that the single
per-domain entry logic would actually cope with the broadcast
delivery logic you're adding here. I'm afraid the per-vCPU entry
logic is a prerequisite to this change (in whatever form it may end
up going in).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.