Xen project Mailing List

Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time

On 06/11/2015 06:20 PM, Paul Durrant wrote: >> -----Original Message----- >> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx] >> Sent: 11 June 2015 09:48 >> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@xxxxxxxxxxxxx >> Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx; >> yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson >> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq >> page only one time >> >> On 06/11/2015 04:32 PM, Paul Durrant wrote: >>>> -----Original Message----- >>>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx] >>>> Sent: 11 June 2015 02:14 >>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen- >> devel@xxxxxxxxxxxxx >>>> Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx; >>>> yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson >>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero >> ioreq >>>> page only one time >>>> >>>> On 06/10/2015 07:47 PM, Paul Durrant wrote: >>>>>> -----Original Message----- >>>>>> From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel- >>>>>> bounces@xxxxxxxxxxxxx] On Behalf Of Wen Congyang >>>>>> Sent: 10 June 2015 12:38 >>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen- >>>> devel@xxxxxxxxxxxxx >>>>>> Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx; >>>>>> yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson >>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero >>>> ioreq >>>>>> page only one time >>>>>> >>>>>> On 06/10/2015 06:58 PM, Paul Durrant wrote: >>>>>>>> -----Original Message----- >>>>>>>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx] >>>>>>>> Sent: 10 June 2015 11:55 >>>>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen- >>>>>> devel@xxxxxxxxxxxxx >>>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong; >>>>>>>> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson >>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: >> zero >>>>>> ioreq >>>>>>>> page only one time >>>>>>>> >>>>>>>> On 06/10/2015 06:40 PM, Paul Durrant wrote: >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx] >>>>>>>>>> Sent: 10 June 2015 10:06 >>>>>>>>>> To: Andrew Cooper; Yang Hongyang; xen-devel@xxxxxxxxxxxxx; >> Paul >>>>>>>> Durrant >>>>>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong; >>>>>>>>>> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson >>>>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: >>>> zero >>>>>>>> ioreq >>>>>>>>>> page only one time >>>>>>>>>> >>>>>>>>>> Cc: Paul Durrant >>>>>>>>>> >>>>>>>>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote: >>>>>>>>>>> On 10/06/2015 06:26, Yang Hongyang wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote: >>>>>>>>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote: >>>>>>>>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote: >>>>>>>>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote: >>>>>>>>>>>>>>>>>> ioreq page contains evtchn which will be set when we >>>>>> resume >>>>>>>> the >>>>>>>>>>>>>>>>>> secondary vm the first time. The hypervisor will check if >>>> the >>>>>>>>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page >>>> more >>>>>>>>>>>>>>>>>> than one time. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after >> the >>>> vm >>>>>> is >>>>>>>>>>>>>>>>>> suspended, so it is OK if we only zero it one time. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Signed-off-by: Yang Hongyang >> <yanghy@xxxxxxxxxxxxxx> >>>>>>>>>>>>>>>>>> Signed-off-by: Wen congyang >> <wency@xxxxxxxxxxxxxx> >>>>>>>>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The issue here is that we are running the restore >> algorithm >>>>>> over >>>>>>>> a >>>>>>>>>>>>>>>>> domain which has already been running in Xen for a >> while. >>>>>> This >>>>>>>> is a >>>>>>>>>>>>>>>>> brand new usecase, as far as I am aware. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Exactly. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Does the qemu process associated with this domain get >>>>>> frozen >>>>>>>>>>>>>>>>> while the >>>>>>>>>>>>>>>>> secondary is being reset, or does the process get >> destroyed >>>>>> and >>>>>>>>>>>>>>>>> recreated. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What do you mean by reset? do you mean secondary is >>>>>>>> suspended >>>>>>>>>> at >>>>>>>>>>>>>>>> checkpoint? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Well - at the point that the buffered records are being >>>>>> processed, >>>>>>>> we >>>>>>>>>>>>>>> are in the process of resetting the state of the secondary to >>>>>> match >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> primary. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, at this point, the qemu process associated with this >>>> domain is >>>>>>>>>>>>>> frozen. >>>>>>>>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in >>>>>> qemu) >>>>>>>> to >>>>>>>>>>>>>> pause >>>>>>>>>>>>>> qemu. After we processed all records, qemu will be restored >>>> with >>>>>>>> the >>>>>>>>>>>>>> received >>>>>>>>>>>>>> state, that's why we add a >>>>>>>> libxl__qmp_restore(qemu_load_vmstate() >>>>>>>>>> in >>>>>>>>>>>>>> qemu) >>>>>>>>>>>>>> api to restore qemu with received state. Currently in libxl, >>>> qemu >>>>>> only >>>>>>>>>>>>>> start >>>>>>>>>>>>>> with the received state, there's no api to load received state >>>> while >>>>>>>>>>>>>> qemu is >>>>>>>>>>>>>> running for a while. >>>>>>>>>>>>> >>>>>>>>>>>>> Now I consider this more, it is absolutely wrong to not zero >> the >>>>>> page >>>>>>>>>>>>> here. The event channel in the page is not guaranteed to be >> the >>>>>>>> same >>>>>>>>>>>>> between the primary and secondary, >>>>>>>>>>>> >>>>>>>>>>>> That's why we don't zero it on secondary. >>>>>>>>>>> >>>>>>>>>>> I think you missed my point. Apologies for the double negative. >> It >>>>>>>>>>> must, under all circumstances, be zeroed at this point, for safety >>>>>>>> reasons. >>>>>>>>>>> >>>>>>>>>>> The page in question is subject to logdirty just like any other >> guest >>>>>>>>>>> pages, which means that if the guest writes to it naturally (i.e. >> not a >>>>>>>>>>> Xen or Qemu write, both of whom have magic mappings which >> are >>>>>> not >>>>>>>>>>> subject to logdirty), it will be transmitted in the stream. As the >>>>>>>>>>> event channel could be different, the lack of zeroing it at this >> point >>>>>>>>>>> means that the event channel would be wrong as opposed to >>>> simply >>>>>>>>>>> missing. This is a worse position to be in. >>>>>>>>>> >>>>>>>>>> The guest should not access this page. I am not sure if the guest >> can >>>>>>>>>> access the ioreq page. >>>>>>>>>> >>>>>>>>>> But in the exceptional case, the ioreq page is dirtied, and is copied >> to >>>>>>>>>> the secondary vm. The ioreq page will contain a wrong event >>>> channel, >>>>>> the >>>>>>>>>> hypervisor will check it: if the event channel is wrong, the guest >> will >>>>>>>>>> be crashed. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> and we don't want to unexpectedly >>>>>>>>>>>>> find a pending/in-flight ioreq. >>>>>>>>>>>> >>>>>>>>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is >>>>>> suspended, >>>>>>>>>> there >>>>>>>>>>>> should be no pending/in-flight ioreq at checkpoint. >>>>>>>>>>> >>>>>>>>>>> In the common case perhaps, but we must consider the >>>> exceptional >>>>>>>> case. >>>>>>>>>>> The exceptional case here is some corruption which happens to >>>>>> appear >>>>>>>> as >>>>>>>>>>> an in-flight ioreq. >>>>>>>>>> >>>>>>>>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If >> the >>>>>>>>>> hypervisor >>>>>>>>>> has a bug, anything can happen. I think we should trust the >>>> hypervisor. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Either qemu needs to take care of re-initialising the event >>>> channels >>>>>>>>>>>>> back to appropriate values, or Xen should tolerate the >> channels >>>>>>>>>>>>> disappearing. >>>>>>>>>>> >>>>>>>>>>> I still stand by this statement. I believe it is the only safe way >>>>>>>>>>> of >>>>>>>>>>> solving the issue you have discovered. >>>>>>>>>> >>>>>>>>>> Add a new qemu monitor command to update ioreq page? >>>>>>>>>> >>>>>>>>> >>>>>>>>> If you're attaching to a 'new' VM (i.e one with an updated image) >> then >>>> I >>>>>>>> suspect you're going to have to destroy and re-create the ioreq >> server >>>> so >>>>>>>> that the shared page gets re-populated with the correct event >>>> channels. >>>>>>>> Either that or you're going to have to ensure that the page is not part >> of >>>>>>>> restored image and sample the new one that Xen should have set >> up. >>>>>>>> >>>>>>>> >>>>>>>> I agree with it. I will try to add a new qemu monitor command(or do >> it >>>>>> when >>>>>>>> updating qemu's state) to destroy and re-create it. >>>>>>> >>>>>>> The slightly tricky part of that is that you're going to have to cache >>>>>>> and >>>>>> replay all the registrations that were done on the old instance, but you >>>> need >>>>>> to do that in any case as it's not state that is transferred in the VM >>>>>> save >>>>>> record. >>>>>> >>>>>> Why do we have to cache and replay all the registrations that were >> done >>>> on >>>>>> the old instance? >>>>> >>>>> Do you not have device models that you need to continue to function? >>>> When the ioreq server is torn down then all MMIO, port IO and PCI config >>>> ranges that were mapped to it will disappear. >>>> >>>> Yes, I don't known which should be done unless I implement and test it. >>>> >>>> I have some questions about it: >>>> 1. Can guest access the ioreq page? If the page is modified by the guest >>>> unexpectedly, >>>> what will happen? >>> >>> No, the guest cannot modify the pages once a non-default ioreq server is >> active. The pages are removed from the guest P2M when it is activated, >> which is one of the reasons for modifying QEMU to not behave as a legacy >> default server. >>> >>>> 2. If the ioreq page is dirtied by the guest, it will be transfered from >> primary >>>> to secondary during checkpoint. The evtchn is invalid, I think the best >>>> behavior >>>> is that: make the guest crashed, not continue to run. >>> >>> As I said, the pages are not in the P2M if the server is active so they >>> will not >> be transferred as part of the VM state. However, this presents a problem; at >> the far end, the emulator will not be able to hook into the guest. So, when >> the source domain is paused, the ioreq server needs to be torn down (so >> that its pages are re-inserted into the P2M and marked dirty for transfer). >> This is what happens in a normal migration. One extra problem you have is >> that the source domain is not then killed, it is resumed along with the >> emulator. Thus, on resume, the emulator needs to create a new ioreq server >> and re-register all its device models with that new server. >>> I don't know the detail of what you do at the far end, but if you always >>> start >> a new emulator instance using the QEMU save record then you should be >> fine (just like with a normal migration). >> >> I don't find the codes where the ioreq server is torn down when the source >> domain is paused. Which function? > > Sorry, I overstated that. By 'torn down' I meant disabled. The function that > does it is: > > static void xen_hvm_change_state_handler(void *opaque, int running, > RunState rstate) > { > XenIOState *state = opaque; > > if (running) { > xen_main_loop_prepare(state); > } > > xen_set_ioreq_server_state(xen_xc, xen_domid, > state->ioservid, > (rstate == RUN_STATE_RUNNING)); > } > Yes, I see it now. If the ioreq page is cleared, can xen_main_loop_prepare() put the correct evtchn in the ioreq page? If so, I think COLO can work without this patch. In the hypervisor, the ioreq page is cleared before re-inserted into the P2M. So the far end always gets the zeroed ioreq page. IIRC, there is only default ioreq server when I wrote this patch. In that case we don't have xen_main_loop_prepare(). It is OK that COLO cannot work with an older version qemu. Yang, can you test COLO without this patch? I guess it can work, and we can drop this patch now. Thanks Wen Congyang >> >> In our implementation, we don't start a new emulator. The codes can work, >> but some bugs may be not triggered. >> > > How do you reconcile the incoming QEMU save record with the running emulator > state? > > Paul > >> Thanks >> Wen Congyang >> >>> >>> Paul >>> >>>> >>>> Thanks >>>> Wen Congyang >>>> >>>>> >>>>> Paul >>>>> >>>>>> We will set to the guest to a new state, the old state should be >> dropped. >>>>>> >>>>>> Thanks >>>>>> Wen Congyang >>>>>> >>>>>>> >>>>>>> Paul >>>>>>> >>>>>>>> >>>>>>>> Thanks >>>>>>>> Wen Congyang >>>>>>>> >>>>>>>>> >>>>>>>>> Paul >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Wen Congyang >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ~Andrew >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>> >>>>>>>>> . >>>>>>>>> >>>>>>> >>>>>>> . >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Xen-devel mailing list >>>>>> Xen-devel@xxxxxxxxxxxxx >>>>>> http://lists.xen.org/xen-devel >>>>> >>>>> _______________________________________________ >>>>> Xen-devel mailing list >>>>> Xen-devel@xxxxxxxxxxxxx >>>>> http://lists.xen.org/xen-devel >>>>> . >>>>> >>> >>> . >>> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel > . > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.