Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode

On 12/08/15 14:29, Andrew Cooper wrote:
> On 11/08/15 19:29, Boris Ostrovsky wrote:
>> On 08/11/2015 01:19 PM, Andrew Cooper wrote:
>>> On 11/08/15 18:05, Tim Deegan wrote:
>>>>>>> * Under this model, PV exception handlers should copy themselves
>>>>>>> onto
>>>>>>> the privileged execution stack.
>>>>>>> * Currently, the IST handlers  copy themselves onto the primary
>>>>>>> stack if
>>>>>>> they interrupt guest context.
>>>>>>> * AMD Task Register on vmexit.  (this old gem)
>>>>>> Gah, this thing. :
>>>>> Curious (and I can't seem find this in the manuals): What is this
>>>>> thing?
>>>> IIRC: AMD processors don't context switch TR on vmexit,
>>> Correct
>>>> which makes using IST handlers tricky there.
>>> (That is one way of putting it)
>>> IST handlers cannot be used by Xen if Xen does not switch the task
>>> register before stgi, or IST exceptions (NMI, MCE and double fault) will
>>> be taken with guest-supplied stack pointers.
>>>> We'd have to do the TR context switch ourselves, and that would be
>>>> expensive.
>>> It is suspected to be expensive, but I have never actually seen any
>>> numbers one way or another.
>>>> Andrew, am I remembering that right?
>>> Looks about right.
>>> I have been meaning to investigate this for a while, but never had
>>> the time.
>>> Xen opts for disabling interrupt stack tables in the context of AMD HVM
>>> vcpus, which interacts catastrophically with debug builds using
>>> MEMORY_GUARD.  MEMORY_GUARD shoots a page out of the primary stack to
>>> detect stack overflows, but without an IST double fault hander, ends in
>>> a triple fault rather than a host crash detailing the stack overflow.
>>> KVM unilaterally reloads the host task register on vmexit, and I suspect
>>> this is probably the way to go, but have not had time to investigate
>>> whether there is any performance impact from doing so.  Given how little
>>> of a TSS is actually used in long mode, I wouldn't expect an `ltr` to be
>>> as expensive as it might have been in legacy modes.
>>> (CC'ing the AMD SVM maintainers to see if they have any information on
>>> this subject)
>> I actually didn't even realize that TR is not saved on vmexit ;-/.
>> Would switching TR only when we know that we need to enter this
>> deprivileged mode help?
> This is an absolute must.  It is not safe to use syscall/sysexit without
> IST in place for NMIs and MCEs.
>> Assuming that it is less expensive than copying the stack.
> I was referring to the stack overflow issue, and whether it might be
> sensible to pro-actively which TR.

Ahem! s/which/switch/


