[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2] xSplice design

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
From: Martin Pohlack <mpohlack@xxxxxxxxxx>
Date: Fri, 12 Jun 2015 16:31:12 +0200
Cc: Elena Ufimtseva <elena.ufimtseva@xxxxxxxxxx>, jeremy@xxxxxxxx, hanweidong@xxxxxxxxxx, jbeulich@xxxxxxxx, john.liuqiming@xxxxxxxxxx, Paul Voccio <paul.voccio@xxxxxxxxxxxxx>, Daniel Kiper <daniel.kiper@xxxxxxxxxx>, Major Hayden <major.hayden@xxxxxxxxxxxxx>, liuyingdong@xxxxxxxxxx, aliguori@xxxxxxxxxx, xiantao.zxt@xxxxxxxxxxxxxxx, lars.kurth@xxxxxxxxxx, Steven Wilson <steven.wilson@xxxxxxxxxxxxx>, peter.huangpeng@xxxxxxxxxx, msw@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, Rick Harris <rick.harris@xxxxxxxxxxxxx>, boris.ostrovsky@xxxxxxxxxx, Josh Kearney <josh.kearney@xxxxxxxxxxxxx>, jinsong.liu@xxxxxxxxxxxxxxx, Antony Messerli <amesserl@xxxxxxxxxxxxx>, konrad@xxxxxxxxxx, fanhenglong@xxxxxxxxxx, andrew.cooper3@xxxxxxxxxx
Delivery-date: Fri, 12 Jun 2015 14:33:13 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 12.06.2015 16:03, Konrad Rzeszutek Wilk wrote:
> On Fri, Jun 12, 2015 at 01:39:05PM +0200, Martin Pohlack wrote:
>> On 15.05.2015 21:44, Konrad Rzeszutek Wilk wrote:
>> [...]
>>> ## Hypercalls
>>>
>>> We will employ the sub operations of the system management hypercall 
>>> (sysctl).
>>> There are to be four sub-operations:
>>>
>>>  * upload the payloads.
>>>  * listing of payloads summary uploaded and their state.
>>>  * getting an particular payload summary and its state.
>>>  * command to apply, delete, or revert the payload.
>>>
>>> The patching is asynchronous therefore the caller is responsible
>>> to verify that it has been applied properly by retrieving the summary of it
>>> and verifying that there are no error codes associated with the payload.
>>>
>>> We **MUST** make it asynchronous due to the nature of patching: it requires
>>> every physical CPU to be lock-step with each other. The patching mechanism
>>> while an implementation detail, is not an short operation and as such
>>> the design **MUST** assume it will be an long-running operation.
>>
>> I am not convinced yet, that you need an asynchronous approach here.
>>
>> The experience from our prototype suggests that hotpatching itself is
>> not an expensive operation.  It can usually be completed well below 1ms
>> with the most expensive part being getting the hypervisor to a quiet state.
>>
>> If we go for a barrier at hypervisor exit, combined with forcing all
>> other CPUs through the hypervisor with IPIs, the typical case is very quick.
>>
>> The only reason why that would take some time is, if another CPU is
>> executing a lengthy operation in the hypervisor already.  In that case,
>> you probably don't want to block the whole machine waiting for the
>> joining of that single CPU anyway and instead re-try later, for example,
>> using a timeout on the barrier.  That could be signaled to the user-land
>> process (EAGAIN) so that he could re-attempt hotpatching after some seconds.
> 
> Which is also an asynchronous operation.

Right, but in userland.  My main aim is to have as little complicated
code as possible in the hypervisor for obvious reasons.  This approach
would not require any further tracking of state in the hypervisor.

> The experience with previous preemption XSAs have left me quite afraid of
> long-running operations - which is why I was thinking to have this
> baked this at the start.
> 
> Both ways - EAGAIN or doing an _GET_STATUS would provide an mechanism for
> the VCPU to do other work instead of being tied up.

If I understood your proposal correctly, there is a difference.  With
EAGAIN, all activity is dropped and the machine remains fully available
to whatever guests are running at the time.

With _GET_STATUS, you would continue to try to bring the hypervisor to a
quiet state in the background but return to userland to let this one
thread continue.  Behind the scenes though, you would still need to
capture all CPUs at one point and all captured CPUs would have to wait
for the last straggler.  That would lead to noticeable dead-time for
guests running on-top.

I might have misunderstood your proposal though.

> The EAGAIN mandates that the 'bringing the CPUs together' must be done
> under 1ms and that there must be code to enforce an timeout on the barrier.

The 1ms is just a random number.  I would actually suggest to allow a
sysadmin or hotpatch management tooling to specify how long one is
willing to potentially block the whole machine when waiting for a
stop_machine-like barrier as part of a relevant hypercall.  You could
imagine userland to start out with 1ms and slowly work its way up
whenever it retries.

> The _GET_STATUS does not enforce this and can take longer giving us
> more breathing room - and also unbounded time - which means if
> we were to try to cancel it (say it had run for an hour and still
> could not patch it)- we have to add some hairy code to
> deal with cancelling asynchronous code.
> 
> Your way is simpler - but I would advocate expanding the -EAGAIN to _all_
> the xSplice hypercalls. Thoughts?

In my experience, you only need the EAGAIN for hypercalls that use the
quiet state.  Depending on the design, that would be the operations that
do hotpatch activation and deactivation (i.e., the actual splicing).

Martin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [RFC v2] xSplice design
  - From: Konrad Rzeszutek Wilk
- Re: [Xen-devel] [RFC v2] xSplice design
  - From: Jan Beulich

References:
- Re: [Xen-devel] [RFC v2] xSplice design
  - From: Martin Pohlack
- Re: [Xen-devel] [RFC v2] xSplice design
  - From: Konrad Rzeszutek Wilk

Prev by Date: Re: [Xen-devel] [Draft F] Xen on ARM vITS Handling
Next by Date: Re: [Xen-devel] [PATCH v6 COLO 12/15] COLO nic: implement COLO nic subkind
Previous by thread: Re: [Xen-devel] [RFC v2] xSplice design
Next by thread: Re: [Xen-devel] [RFC v2] xSplice design
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.