[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 7/8] xen/x86: Rework CR4 handling for PV guests



On 26/06/2015 07:22, Jan Beulich wrote:
>>>> On 25.06.15 at 18:22, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 25/06/15 15:41, Jan Beulich wrote:
>>>>>> On 24.06.15 at 18:31, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> The major change here is that v->arch.pv_vcpu.ctrlreg[4] now contains only 
>>>> CR4
>>>> bits which Xen wishes to shadow, rather than containing a mix of host and
>>>> guest bits.  This is likely to elicit a warning when migrating a PV guest 
>>>> from
>>>> an older version of Xen, although no function problems as a result.
>>> For one, I'm having a hard time guessing what the part of the sentence
>>> after the comma is meant to tell me.
>> Before this patch, v->arch.pv_vcpu.ctrlreg[4] contained an unqualified
>> combination of host and guest controlled bits.
>>
>> After this patch, it strictly contains the bits Xen chooses to shadow.
> With this you're re-stating what the first sentence said after its comma.
> The question was on the second one though.

Ah.  That was referring to the fact that even if the warning fires, Xen
ignores the results and chooses its own set of bits in CR4.

>
>>> And then, from a support
>>> perspective, such warnings aren't helpful, as they incur questions.
>> It was useful for debugging, but can see your point.  I could reword it
>> to "ignoring $FOO's attempt to update $BAR's CR4".  I would hesitate
>> from removing it completely however.
> Yes, the warning in general should stay of course (there is one
> after all right now). I realize I slightly mis-read the sentence
> though - looks like the implication is that the possible warning is
> from the mixture of bits previously in .ctrlreg[4]?

That was the purpose of the comment.

> Could we flag
> the write originating from a migration, issuing the warning only
> for guest initiated updates?

I can put a current check in (although it will have to be carefully to
avoid issues when creating the idle vcpus).

I wasn't sure how much to care about bad values appearing as a result of
toolstack actions, but I suppose this is usually only restore, so
probably not that important.

>
>>> Plus
>>> finally, storing neither the current guest view nor the current hypervisor
>>> view in that field seems prone to cause headaches in the future. Otoh
>>> this gets it in line with CR0 handling as it looks.
>> My logic was that the set of shadowed bits will never decrease on newer
>> versions of Xen, but Xen's choice of CR4 bits may change.  As a result,
>> neither "guests view" nor "hypervisors view" is appropriate in an
>> upgrade case.
> Okay, namely together with getting it in line with CR0 handling this
> makes sense.
>
>>>> A second change is that visible set of CR4 bits is different.  In 
>>>> particular,
>>>> VMXE and MCE are not leaked into a guest, while SMAP and SMEP are exposed.
>>> Not leaking VMXE into the guest is fine. Not exposing MCE even to
>>> Dom0 is questionable.
>> I would prefer not to special case the hardware domain if possible, and
>> I suppose MCE is not too much of an issue to leak.  We already are,
>> after all.
>>
>>> And exposing SMEP and SMAP without
>>> exposing their CPUID flags seems bogus.
>> I was hoping not to get bogged down in CPUID given my upcoming work to
>> fix feature levelling.
>>
>> After recent consideration, I am still not sure if we want to support
>> SMAP in 32bit PV guests or not.  The trapping of stac/clac would be a
>> high overhead, although the guest could modify EFLAGS.AC itself using popf.
> If it technically works (which it looks like it does) I see no reason
> leaving the decision to guests.

True.  For now, I will leave SMAP/SMEP unexposed and leave the CPUID
side of things to my levelling work.

>
>>>> In addition, add PGE to the set of bits we don't care about a guest 
>>>> attempting to modify.
>>> I agree from a functionality pov this is okay, but do we really want
>>> to allow the guest to turn this off from a performance pov?
>> This change doesn't shadow PGE.  It just prevents the warning from
>> firing if the guest attempts to clear PGE.
> I which case I think the wording you chose is slightly misleading.
>
>>>> @@ -699,6 +719,8 @@ static int __init init_pv_cr4_masks(void)
>>>>          common_mask &= ~X86_CR4_DE;
>>>>      if ( cpu_has_xsave )
>>>>          common_mask &= ~X86_CR4_OSXSAVE;
>>>> +    if ( cpu_has_pge )
>>>> +        common_mask &= ~X86_CR4_PGE;
>>> Following the earlier comment, shouldn't this bit then also be added
>>> to PV_CR4_SHADOW?
>> I don't think so.  A guest has to defer to Xen to perform TLB flushes. 
>> I just wished to avoid the warning.
> Maybe add a comment to that effect then, avoiding future readers
> to feel tempted to correct this apparent inconsistency, just like it
> happened to me?
>
>>> With FSGSBASE not in PV_CR4_SHADOW, how would (as the comment
>>> in init_pv_cr4_masks() says) the guest manage to observe the flag in
>>> its desired state after a migration?
>> The guest does not get any control of FSGSBASE.  Xen unilaterally has it
>> enabled, and uses them on vcpu context switch so it is not safe to ever
>> disable.
> Again - is the comment the misleading? I took it to say that the
> guest can control its _view_ of the flag without controlling the
> hardware setting. I.e. when it wrote it as zero, it would read
> back zero, despite the flag being set in hardware at all times.

I shall see about improving the comments to explain that the these masks
are unrelated to whether Xen chooses to shadow the individual bits or not.

~Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.