[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time



On 06/10/2015 06:58 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx]
>> Sent: 10 June 2015 11:55
>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@xxxxxxxxxxxxx
>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong;
>> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson
>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
>> page only one time
>>
>> On 06/10/2015 06:40 PM, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx]
>>>> Sent: 10 June 2015 10:06
>>>> To: Andrew Cooper; Yang Hongyang; xen-devel@xxxxxxxxxxxxx; Paul
>> Durrant
>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong;
>>>> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson
>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>> ioreq
>>>> page only one time
>>>>
>>>> Cc: Paul Durrant
>>>>
>>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
>>>>> On 10/06/2015 06:26, Yang Hongyang wrote:
>>>>>>
>>>>>>
>>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
>>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
>>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>>>>>>>>>> ioreq page contains evtchn which will be set when we resume
>> the
>>>>>>>>>>>> secondary vm the first time. The hypervisor will check if the
>>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
>>>>>>>>>>>> than one time.
>>>>>>>>>>>>
>>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
>>>>>>>>>>>> suspended, so it is OK if we only zero it one time.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
>>>>>>>>>>>> Signed-off-by: Wen congyang <wency@xxxxxxxxxxxxxx>
>>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>>>>>>>>>
>>>>>>>>>>> The issue here is that we are running the restore algorithm over
>> a
>>>>>>>>>>> domain which has already been running in Xen for a while.  This
>> is a
>>>>>>>>>>> brand new usecase, as far as I am aware.
>>>>>>>>>>
>>>>>>>>>> Exactly.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Does the qemu process associated with this domain get frozen
>>>>>>>>>>> while the
>>>>>>>>>>> secondary is being reset, or does the process get destroyed and
>>>>>>>>>>> recreated.
>>>>>>>>>>
>>>>>>>>>> What do you mean by reset? do you mean secondary is
>> suspended
>>>> at
>>>>>>>>>> checkpoint?
>>>>>>>>>
>>>>>>>>> Well - at the point that the buffered records are being processed,
>> we
>>>>>>>>> are in the process of resetting the state of the secondary to match
>>>>>>>>> the
>>>>>>>>> primary.
>>>>>>>>
>>>>>>>> Yes, at this point, the qemu process associated with this domain is
>>>>>>>> frozen.
>>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu)
>> to
>>>>>>>> pause
>>>>>>>> qemu. After we processed all records, qemu will be restored with
>> the
>>>>>>>> received
>>>>>>>> state, that's why we add a
>> libxl__qmp_restore(qemu_load_vmstate()
>>>> in
>>>>>>>> qemu)
>>>>>>>> api to restore qemu with received state. Currently in libxl, qemu only
>>>>>>>> start
>>>>>>>> with the received state, there's no api to load received state while
>>>>>>>> qemu is
>>>>>>>> running for a while.
>>>>>>>
>>>>>>> Now I consider this more, it is absolutely wrong to not zero the page
>>>>>>> here.  The event channel in the page is not guaranteed to be the
>> same
>>>>>>> between the primary and secondary,
>>>>>>
>>>>>> That's why we don't zero it on secondary.
>>>>>
>>>>> I think you missed my point.  Apologies for the double negative.   It
>>>>> must, under all circumstances, be zeroed at this point, for safety
>> reasons.
>>>>>
>>>>> The page in question is subject to logdirty just like any other guest
>>>>> pages, which means that if the guest writes to it naturally (i.e. not a
>>>>> Xen or Qemu write, both of whom have magic mappings which are not
>>>>> subject to logdirty), it will be transmitted in the stream.  As the
>>>>> event channel could be different, the lack of zeroing it at this point
>>>>> means that the event channel would be wrong as opposed to simply
>>>>> missing.  This is a worse position to be in.
>>>>
>>>> The guest should not access this page. I am not sure if the guest can
>>>> access the ioreq page.
>>>>
>>>> But in the exceptional case, the ioreq page is dirtied, and is copied to
>>>> the secondary vm. The ioreq page will contain a wrong event channel, the
>>>> hypervisor will check it: if the event channel is wrong, the guest will
>>>> be crashed.
>>>>
>>>>>
>>>>>>
>>>>>>> and we don't want to unexpectedly
>>>>>>> find a pending/in-flight ioreq.
>>>>>>
>>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is suspended,
>>>> there
>>>>>> should be no pending/in-flight ioreq at checkpoint.
>>>>>
>>>>> In the common case perhaps, but we must consider the exceptional
>> case.
>>>>> The exceptional case here is some corruption which happens to appear
>> as
>>>>> an in-flight ioreq.
>>>>
>>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the
>>>> hypervisor
>>>> has a bug, anything can happen. I think we should trust the hypervisor.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Either qemu needs to take care of re-initialising the event channels
>>>>>>> back to appropriate values, or Xen should tolerate the channels
>>>>>>> disappearing.
>>>>>
>>>>> I still stand by this statement.  I believe it is the only safe way of
>>>>> solving the issue you have discovered.
>>>>
>>>> Add a new qemu monitor command to update ioreq page?
>>>>
>>>
>>> If you're attaching to a 'new' VM (i.e one with an updated image) then I
>> suspect you're going to have to destroy and re-create the ioreq server so
>> that the shared page gets re-populated with the correct event channels.
>> Either that or you're going to have to ensure that the page is not part of
>> restored image and sample the new one that Xen should have set up.
>>
>>
>> I agree with it. I will try to add a new qemu monitor command(or do it when
>> updating qemu's state) to destroy and re-create it.
> 
> The slightly tricky part of that is that you're going to have to cache and 
> replay all the registrations that were done on the old instance, but you need 
> to do that in any case as it's not state that is transferred in the VM save 
> record.

Why do we have to cache and replay all the registrations that were done on the 
old instance?
We will set to the guest to a new state, the old state should be dropped.

Thanks
Wen Congyang

> 
>   Paul
> 
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>>   Paul
>>>
>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>> ~Andrew
>>>>> .
>>>>>
>>>
>>> .
>>>
> 
> .
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.