[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2] xSplice design



On 30.10.2015 15:03, Ross Lagerwall wrote:
> On 10/30/2015 10:39 AM, Martin Pohlack wrote:
>> On 29.10.2015 17:55, Ross Lagerwall wrote:
>>> On 10/27/2015 12:05 PM, Ross Lagerwall wrote:
>>>> On 06/12/2015 12:39 PM, Martin Pohlack wrote:
>>>>> On 15.05.2015 21:44, Konrad Rzeszutek Wilk wrote:
>>>>> [...]
>>>>>> ## Hypercalls
>>>>>>
>>>>>> We will employ the sub operations of the system management hypercall
>>>>>> (sysctl).
>>>>>> There are to be four sub-operations:
>>>>>>
>>>>>>    * upload the payloads.
>>>>>>    * listing of payloads summary uploaded and their state.
>>>>>>    * getting an particular payload summary and its state.
>>>>>>    * command to apply, delete, or revert the payload.
>>>>>>
>>>>>> The patching is asynchronous therefore the caller is responsible
>>>>>> to verify that it has been applied properly by retrieving the summary
>>>>>> of it
>>>>>> and verifying that there are no error codes associated with the payload.
>>>>>>
>>>>>> We **MUST** make it asynchronous due to the nature of patching: it
>>>>>> requires
>>>>>> every physical CPU to be lock-step with each other. The patching
>>>>>> mechanism
>>>>>> while an implementation detail, is not an short operation and as such
>>>>>> the design **MUST** assume it will be an long-running operation.
>>>>>
>>>>> I am not convinced yet, that you need an asynchronous approach here.
>>>>>
>>>>> The experience from our prototype suggests that hotpatching itself is
>>>>> not an expensive operation.  It can usually be completed well below 1ms
>>>>> with the most expensive part being getting the hypervisor to a quiet
>>>>> state.
>>>>>
>>>>
>>>> FWIW, my current implementation (which is almost certainly not optimal)
>>>> tested on a 72 CPU machine takes about 3ms, whether idle or fully loaded.
>>>>
>>>
>>> Let me correct that: it takes 60 Îs to 100 Îs to synchronize and apply
>>> the patch (on the same hardware) when synchronous console logging is
>>> turned off.
>>
>> The interesting (and very rare) case is if other CPUs are busy in Xen
>> already, for example, with memory scrubbing or other long-running
>> activities.  Those are hard to interrupt and delay patching activity.
>>
>> Having multiple guests in a reboot-loop / being restarted all the time
>> might help triggering this case.
>>
> 
> I have been able to trigger this which is why I put in a (currently 
> hard-coded) 10ms timeout in the synchronization code otherwise it gives 
> up and returns an error. It could then be optionally retried by the user 
> at a later point.

If you ever want to run this in QEMU etc. you need to account for the
scheduling timeslice of the host system.  I found it necessary to work
with 20 ms for that specific case.

Martin

Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.