[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time



On 06/11/2015 04:32 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx]
>> Sent: 11 June 2015 02:14
>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@xxxxxxxxxxxxx
>> Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx;
>> yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson
>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
>> page only one time
>>
>> On 06/10/2015 07:47 PM, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel-
>>>> bounces@xxxxxxxxxxxxx] On Behalf Of Wen Congyang
>>>> Sent: 10 June 2015 12:38
>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>> devel@xxxxxxxxxxxxx
>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx;
>>>> yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson
>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>> ioreq
>>>> page only one time
>>>>
>>>> On 06/10/2015 06:58 PM, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx]
>>>>>> Sent: 10 June 2015 11:55
>>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
>>>> devel@xxxxxxxxxxxxx
>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong;
>>>>>> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson
>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
>>>> ioreq
>>>>>> page only one time
>>>>>>
>>>>>> On 06/10/2015 06:40 PM, Paul Durrant wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx]
>>>>>>>> Sent: 10 June 2015 10:06
>>>>>>>> To: Andrew Cooper; Yang Hongyang; xen-devel@xxxxxxxxxxxxx; Paul
>>>>>> Durrant
>>>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong;
>>>>>>>> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson
>>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
>> zero
>>>>>> ioreq
>>>>>>>> page only one time
>>>>>>>>
>>>>>>>> Cc: Paul Durrant
>>>>>>>>
>>>>>>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
>>>>>>>>> On 10/06/2015 06:26, Yang Hongyang wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
>>>>>>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
>>>>>>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>>>>>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>>>>>>>>>>>>>> ioreq page contains evtchn which will be set when we
>>>> resume
>>>>>> the
>>>>>>>>>>>>>>>> secondary vm the first time. The hypervisor will check if
>> the
>>>>>>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page
>> more
>>>>>>>>>>>>>>>> than one time.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the
>> vm
>>>> is
>>>>>>>>>>>>>>>> suspended, so it is OK if we only zero it one time.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
>>>>>>>>>>>>>>>> Signed-off-by: Wen congyang <wency@xxxxxxxxxxxxxx>
>>>>>>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The issue here is that we are running the restore algorithm
>>>> over
>>>>>> a
>>>>>>>>>>>>>>> domain which has already been running in Xen for a while.
>>>> This
>>>>>> is a
>>>>>>>>>>>>>>> brand new usecase, as far as I am aware.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Exactly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does the qemu process associated with this domain get
>>>> frozen
>>>>>>>>>>>>>>> while the
>>>>>>>>>>>>>>> secondary is being reset, or does the process get destroyed
>>>> and
>>>>>>>>>>>>>>> recreated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What do you mean by reset? do you mean secondary is
>>>>>> suspended
>>>>>>>> at
>>>>>>>>>>>>>> checkpoint?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well - at the point that the buffered records are being
>>>> processed,
>>>>>> we
>>>>>>>>>>>>> are in the process of resetting the state of the secondary to
>>>> match
>>>>>>>>>>>>> the
>>>>>>>>>>>>> primary.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, at this point, the qemu process associated with this
>> domain is
>>>>>>>>>>>> frozen.
>>>>>>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in
>>>> qemu)
>>>>>> to
>>>>>>>>>>>> pause
>>>>>>>>>>>> qemu. After we processed all records, qemu will be restored
>> with
>>>>>> the
>>>>>>>>>>>> received
>>>>>>>>>>>> state, that's why we add a
>>>>>> libxl__qmp_restore(qemu_load_vmstate()
>>>>>>>> in
>>>>>>>>>>>> qemu)
>>>>>>>>>>>> api to restore qemu with received state. Currently in libxl,
>> qemu
>>>> only
>>>>>>>>>>>> start
>>>>>>>>>>>> with the received state, there's no api to load received state
>> while
>>>>>>>>>>>> qemu is
>>>>>>>>>>>> running for a while.
>>>>>>>>>>>
>>>>>>>>>>> Now I consider this more, it is absolutely wrong to not zero the
>>>> page
>>>>>>>>>>> here.  The event channel in the page is not guaranteed to be the
>>>>>> same
>>>>>>>>>>> between the primary and secondary,
>>>>>>>>>>
>>>>>>>>>> That's why we don't zero it on secondary.
>>>>>>>>>
>>>>>>>>> I think you missed my point.  Apologies for the double negative.   It
>>>>>>>>> must, under all circumstances, be zeroed at this point, for safety
>>>>>> reasons.
>>>>>>>>>
>>>>>>>>> The page in question is subject to logdirty just like any other guest
>>>>>>>>> pages, which means that if the guest writes to it naturally (i.e. not 
>>>>>>>>> a
>>>>>>>>> Xen or Qemu write, both of whom have magic mappings which are
>>>> not
>>>>>>>>> subject to logdirty), it will be transmitted in the stream.  As the
>>>>>>>>> event channel could be different, the lack of zeroing it at this point
>>>>>>>>> means that the event channel would be wrong as opposed to
>> simply
>>>>>>>>> missing.  This is a worse position to be in.
>>>>>>>>
>>>>>>>> The guest should not access this page. I am not sure if the guest can
>>>>>>>> access the ioreq page.
>>>>>>>>
>>>>>>>> But in the exceptional case, the ioreq page is dirtied, and is copied 
>>>>>>>> to
>>>>>>>> the secondary vm. The ioreq page will contain a wrong event
>> channel,
>>>> the
>>>>>>>> hypervisor will check it: if the event channel is wrong, the guest will
>>>>>>>> be crashed.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> and we don't want to unexpectedly
>>>>>>>>>>> find a pending/in-flight ioreq.
>>>>>>>>>>
>>>>>>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is
>>>> suspended,
>>>>>>>> there
>>>>>>>>>> should be no pending/in-flight ioreq at checkpoint.
>>>>>>>>>
>>>>>>>>> In the common case perhaps, but we must consider the
>> exceptional
>>>>>> case.
>>>>>>>>> The exceptional case here is some corruption which happens to
>>>> appear
>>>>>> as
>>>>>>>>> an in-flight ioreq.
>>>>>>>>
>>>>>>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the
>>>>>>>> hypervisor
>>>>>>>> has a bug, anything can happen. I think we should trust the
>> hypervisor.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Either qemu needs to take care of re-initialising the event
>> channels
>>>>>>>>>>> back to appropriate values, or Xen should tolerate the channels
>>>>>>>>>>> disappearing.
>>>>>>>>>
>>>>>>>>> I still stand by this statement.  I believe it is the only safe way of
>>>>>>>>> solving the issue you have discovered.
>>>>>>>>
>>>>>>>> Add a new qemu monitor command to update ioreq page?
>>>>>>>>
>>>>>>>
>>>>>>> If you're attaching to a 'new' VM (i.e one with an updated image) then
>> I
>>>>>> suspect you're going to have to destroy and re-create the ioreq server
>> so
>>>>>> that the shared page gets re-populated with the correct event
>> channels.
>>>>>> Either that or you're going to have to ensure that the page is not part 
>>>>>> of
>>>>>> restored image and sample the new one that Xen should have set up.
>>>>>>
>>>>>>
>>>>>> I agree with it. I will try to add a new qemu monitor command(or do it
>>>> when
>>>>>> updating qemu's state) to destroy and re-create it.
>>>>>
>>>>> The slightly tricky part of that is that you're going to have to cache and
>>>> replay all the registrations that were done on the old instance, but you
>> need
>>>> to do that in any case as it's not state that is transferred in the VM save
>>>> record.
>>>>
>>>> Why do we have to cache and replay all the registrations that were done
>> on
>>>> the old instance?
>>>
>>> Do you not have device models that you need to continue to function?
>> When the ioreq server is torn down then all MMIO, port IO and PCI config
>> ranges that were mapped to it will disappear.
>>
>> Yes, I don't known which should be done unless I implement and test it.
>>
>> I have some questions about it:
>> 1. Can guest access the ioreq page? If the page is modified by the guest
>> unexpectedly,
>>    what will happen?
> 
> No, the guest cannot modify the pages once a non-default ioreq server is 
> active. The pages are removed from the guest P2M when it is activated, which 
> is one of the reasons for modifying QEMU to not behave as a legacy default 
> server.
> 
>> 2. If the ioreq page is dirtied by the guest, it will be transfered from 
>> primary
>>    to secondary during checkpoint. The evtchn is invalid, I think the best
>> behavior
>>    is that: make the guest crashed, not continue to run.
> 
> As I said, the pages are not in the P2M if the server is active so they will 
> not be transferred as part of the VM state. However, this presents a problem; 
> at the far end, the emulator will not be able to hook into the guest. So, 
> when the source domain is paused, the ioreq server needs to be torn down (so 
> that its pages are re-inserted into the P2M and marked dirty for transfer). 
> This is what happens in a normal migration. One extra problem you have is 
> that the source domain is not then killed, it is resumed along with the 
> emulator. Thus, on resume, the emulator needs to create a new ioreq server 
> and re-register all its device models with that new server.
> I don't know the detail of what you do at the far end, but if you always 
> start a new emulator instance using the QEMU save record then you should be 
> fine (just like with a normal migration).

I don't find the codes where the ioreq server is torn down when the source 
domain is paused. Which function?

In our implementation, we don't start a new emulator. The codes can work, but 
some bugs may be not triggered.

Thanks
Wen Congyang

> 
>   Paul
>  
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>>   Paul
>>>
>>>> We will set to the guest to a new state, the old state should be dropped.
>>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>>   Paul
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Wen Congyang
>>>>>>
>>>>>>>
>>>>>>>   Paul
>>>>>>>
>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Wen Congyang
>>>>>>>>
>>>>>>>>>
>>>>>>>>> ~Andrew
>>>>>>>>> .
>>>>>>>>>
>>>>>>>
>>>>>>> .
>>>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxx
>>>> http://lists.xen.org/xen-devel
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@xxxxxxxxxxxxx
>>> http://lists.xen.org/xen-devel
>>> .
>>>
> 
> .
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.