[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode





On 11/08/15 18:05, Tim Deegan wrote:
Hi,

At 17:51 +0100 on 11 Aug (1439315508), Ben Catterall wrote:
On 11/08/15 10:55, Tim Deegan wrote:
At 11:14 +0100 on 10 Aug (1439205273), Andrew Cooper wrote:
On 10/08/15 10:49, Tim Deegan wrote:
Hi,

At 17:45 +0100 on 06 Aug (1438883118), Ben Catterall wrote:
The process to switch into and out of deprivileged mode can be likened to
setjmp/longjmp.

To enter deprivileged mode, we take a copy of the stack from the guest's
registers up to the current stack pointer.
This copy is pretty unfortunate, but I can see that avoiding it will
be a bit complex.  Could we do something with more stacks?  AFAICS
there have to be three stacks anyway:

   - one to hold the depriv execution context;
   - one to hold the privileged execution context; and
   - one to take interrupts on.

So maybe we could do some fiddling to make Xen take interrupts on a
different stack while we're depriv'd?

That should happen naturally by virtue of the privilege level change
involved in taking the interrupt.

Right, and this is why we need a third stack - so interrupts don't
trash the existing priv state on the 'normal' Xen stack.  And so we
either need to copy the priv stack out (and maybe copy it back), or
tell the CPU to use a different stack.

The copy is relatively small and paid only on the first and last entries
into the mode. I don't know if this is cheaper than the  bookwork that
would be needed on entering and returning from the mode to switch to
these stacks. I'm assuming the sp pointers in the TSS and ISTs would
need changing on the first and last entry/exit if we have the extra
stack, is that correct?

Yep.

Or, is this a more dramatic change in that
everything uses this three stack model rather than just this feature.

Well, some other parts would have to change to accomodate this new
behaviour - that was what Andrew was talking about.

BTW, I think there need to be three stacks anyway, since the depriv
code shouldn't be allowed to write to the priv code's stack frames.
Or maybe I've misunderstood how much access the depriv code will have.
So, just to clarify:

We have a separate deprivileged stack allocated which the deprivileged code uses. This is mapped in user mode.

We have the privileged stack which Xen runs on. To prevent this being clobbered when we are in our mode and take an interrupt, we copy this out to a buffer. This buffer is the saved privileged stack state.

So, we sort of have three stacks already, just the privileged stack is copied out to a buffer, rather than switching pointers to another interrupt stack.

Hopefully that clarifies?


I'm not sure how much in Xen would need changing to switch across to
using three stacks. Also, would this also need to be done for PV guests?
Would that need to be a separate patch series?

What's the overall consensus? Thanks!

I'm not sure there is one yet -- needs some more discussion of
whether the non-copying approach is feasible.

If we had enough headroom, we could try to be clever and tell the CPU
to take interrupts on the priv stack _below_ the existing state.  That
would avoid the first of your problems below.

* Under this model, PV exception handlers should copy themselves onto
the privileged execution stack.
* Currently, the IST handlers  copy themselves onto the primary stack if
they interrupt guest context.
* AMD Task Register on vmexit.  (this old gem)

Gah, this thing. :
Curious (and I can't seem find this in the manuals): What is this thing?

IIRC: AMD processors don't context switch TR on vmexit, which makes
using IST handlers tricky there.  We'd have to do the TR context
switch ourselves, and that would be expensive.  Andrew, am I
remembering that right?

Thanks!
Tim.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.