[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v9 08/11] x86/entry: Clobber the Return Stack Buffer/Return Address Stack on entry to Xen

On 19/01/18 16:19, Jan Beulich wrote:
>>>> On 19.01.18 at 17:10, <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 19/01/18 15:02, Jan Beulich wrote:
>>>>>> On 19.01.18 at 15:24, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> On 19/01/18 12:47, Jan Beulich wrote:
>>>>>>>> On 18.01.18 at 16:46, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>>> @@ -265,6 +265,10 @@ On hardware supporting IBRS, the `ibrs=` option can 
>>>>>> be 
>>>>>> used to force or
>>>>>>  prevent Xen using the feature itself.  If Xen is not using IBRS itself,
>>>>>>  functionality is still set up so IBRS can be virtualised for guests.
>>>>>> +The `rsb_vmexit=` and `rsb_native=` options can be used to fine tune 
>>>>>> when the
>>>>>> +RSB gets overwritten.  There are individual controls for an entry from 
>>>>>> HVM
>>>>>> +context, and an entry from a native (PV or Xen) context.
>>>>> Would you mind adding a sentence or two to the description making
>>>>> clear what use this fine grained control is? I can't really figure why I
>>>>> might need to be concerned about one of the two cases, but not the
>>>>> other.
>>>> I though I'd covered that in the commit message, but I'm not sure this
>>>> is a suitable place to discuss the details.  PV and HVM guests have
>>>> different reasoning for why we need to overwrite the RSB.
>>>> In the past, there used to be a default interaction of rsb_native and
>>>> SMEP, but that proved to be insufficient and rsb_native is now
>>>> unconditionally enabled.  In principle however, it should fall within
>>> Thanks for the explanation, but I'm afraid I'm none the wiser as
>>> to why the two separate options are needed (or even just wanted).
>> If you never run 32bit PV guests, and don't use Introspection on HVM VMs
>> larger than 7 vcpus, then you are believed safe to turn rsb_native off.
> Where does that funny 7 come from.

It was 7 last time I looked.

Its complicated, but is basically sizeof(vm_event ring) /
sizeof(vm_event request), accounting for there being space for at least
one synchronous request remaining if async requests are actually used.

The plan for resolving this is:
1) Paul's new mapping API
2) Switching the vm_event ring in two
2a) sync requests become a straight array using vcpu_id as an index
2b) async requests can become an (arbitrary large, within reason)
multipage ring, and respecified to be lossy if not drained quickly enough.
3) I purge the waitqueue infrastructure and pretend that it never existed.

This means that livepatching is finally safe in combination with
introspection, and the rings can't be fiddled with my a cunning guest.

>> Also, based on feedback we're seeing from the field, an "I trust my PV
>> guest mode" looks like it will go a long way.  When dom0 is the only PV
>> guest (which is very common, and increasingly so these days), then we
>> can drop rsb_native and IBRS_CLEAR, giving us zero overhead for
>> exception/interrupt handling.
> Okay, makes sense, thanks. Would be nice if you could add some of
> this to the description.

I'll see what I can do.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.