Xen project Mailing List

Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time

On 06/11/2015 04:32 PM, Paul Durrant wrote: >> -----Original Message----- >> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx] >> Sent: 11 June 2015 02:14 >> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@xxxxxxxxxxxxx >> Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx; >> yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson >> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq >> page only one time >> >> On 06/10/2015 07:47 PM, Paul Durrant wrote: >>>> -----Original Message----- >>>> From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel- >>>> bounces@xxxxxxxxxxxxx] On Behalf Of Wen Congyang >>>> Sent: 10 June 2015 12:38 >>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen- >> devel@xxxxxxxxxxxxx >>>> Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx; >>>> yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson >>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero >> ioreq >>>> page only one time >>>> >>>> On 06/10/2015 06:58 PM, Paul Durrant wrote: >>>>>> -----Original Message----- >>>>>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx] >>>>>> Sent: 10 June 2015 11:55 >>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen- >>>> devel@xxxxxxxxxxxxx >>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong; >>>>>> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson >>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero >>>> ioreq >>>>>> page only one time >>>>>> >>>>>> On 06/10/2015 06:40 PM, Paul Durrant wrote: >>>>>>>> -----Original Message----- >>>>>>>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx] >>>>>>>> Sent: 10 June 2015 10:06 >>>>>>>> To: Andrew Cooper; Yang Hongyang; xen-devel@xxxxxxxxxxxxx; Paul >>>>>> Durrant >>>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong; >>>>>>>> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson >>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: >> zero >>>>>> ioreq >>>>>>>> page only one time >>>>>>>> >>>>>>>> Cc: Paul Durrant >>>>>>>> >>>>>>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote: >>>>>>>>> On 10/06/2015 06:26, Yang Hongyang wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote: >>>>>>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote: >>>>>>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote: >>>>>>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote: >>>>>>>>>>>>>>>> ioreq page contains evtchn which will be set when we >>>> resume >>>>>> the >>>>>>>>>>>>>>>> secondary vm the first time. The hypervisor will check if >> the >>>>>>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page >> more >>>>>>>>>>>>>>>> than one time. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the >> vm >>>> is >>>>>>>>>>>>>>>> suspended, so it is OK if we only zero it one time. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx> >>>>>>>>>>>>>>>> Signed-off-by: Wen congyang <wency@xxxxxxxxxxxxxx> >>>>>>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The issue here is that we are running the restore algorithm >>>> over >>>>>> a >>>>>>>>>>>>>>> domain which has already been running in Xen for a while. >>>> This >>>>>> is a >>>>>>>>>>>>>>> brand new usecase, as far as I am aware. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Exactly. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Does the qemu process associated with this domain get >>>> frozen >>>>>>>>>>>>>>> while the >>>>>>>>>>>>>>> secondary is being reset, or does the process get destroyed >>>> and >>>>>>>>>>>>>>> recreated. >>>>>>>>>>>>>> >>>>>>>>>>>>>> What do you mean by reset? do you mean secondary is >>>>>> suspended >>>>>>>> at >>>>>>>>>>>>>> checkpoint? >>>>>>>>>>>>> >>>>>>>>>>>>> Well - at the point that the buffered records are being >>>> processed, >>>>>> we >>>>>>>>>>>>> are in the process of resetting the state of the secondary to >>>> match >>>>>>>>>>>>> the >>>>>>>>>>>>> primary. >>>>>>>>>>>> >>>>>>>>>>>> Yes, at this point, the qemu process associated with this >> domain is >>>>>>>>>>>> frozen. >>>>>>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in >>>> qemu) >>>>>> to >>>>>>>>>>>> pause >>>>>>>>>>>> qemu. After we processed all records, qemu will be restored >> with >>>>>> the >>>>>>>>>>>> received >>>>>>>>>>>> state, that's why we add a >>>>>> libxl__qmp_restore(qemu_load_vmstate() >>>>>>>> in >>>>>>>>>>>> qemu) >>>>>>>>>>>> api to restore qemu with received state. Currently in libxl, >> qemu >>>> only >>>>>>>>>>>> start >>>>>>>>>>>> with the received state, there's no api to load received state >> while >>>>>>>>>>>> qemu is >>>>>>>>>>>> running for a while. >>>>>>>>>>> >>>>>>>>>>> Now I consider this more, it is absolutely wrong to not zero the >>>> page >>>>>>>>>>> here. The event channel in the page is not guaranteed to be the >>>>>> same >>>>>>>>>>> between the primary and secondary, >>>>>>>>>> >>>>>>>>>> That's why we don't zero it on secondary. >>>>>>>>> >>>>>>>>> I think you missed my point. Apologies for the double negative. It >>>>>>>>> must, under all circumstances, be zeroed at this point, for safety >>>>>> reasons. >>>>>>>>> >>>>>>>>> The page in question is subject to logdirty just like any other guest >>>>>>>>> pages, which means that if the guest writes to it naturally (i.e. not >>>>>>>>> a >>>>>>>>> Xen or Qemu write, both of whom have magic mappings which are >>>> not >>>>>>>>> subject to logdirty), it will be transmitted in the stream. As the >>>>>>>>> event channel could be different, the lack of zeroing it at this point >>>>>>>>> means that the event channel would be wrong as opposed to >> simply >>>>>>>>> missing. This is a worse position to be in. >>>>>>>> >>>>>>>> The guest should not access this page. I am not sure if the guest can >>>>>>>> access the ioreq page. >>>>>>>> >>>>>>>> But in the exceptional case, the ioreq page is dirtied, and is copied >>>>>>>> to >>>>>>>> the secondary vm. The ioreq page will contain a wrong event >> channel, >>>> the >>>>>>>> hypervisor will check it: if the event channel is wrong, the guest will >>>>>>>> be crashed. >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> and we don't want to unexpectedly >>>>>>>>>>> find a pending/in-flight ioreq. >>>>>>>>>> >>>>>>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is >>>> suspended, >>>>>>>> there >>>>>>>>>> should be no pending/in-flight ioreq at checkpoint. >>>>>>>>> >>>>>>>>> In the common case perhaps, but we must consider the >> exceptional >>>>>> case. >>>>>>>>> The exceptional case here is some corruption which happens to >>>> appear >>>>>> as >>>>>>>>> an in-flight ioreq. >>>>>>>> >>>>>>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the >>>>>>>> hypervisor >>>>>>>> has a bug, anything can happen. I think we should trust the >> hypervisor. >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Either qemu needs to take care of re-initialising the event >> channels >>>>>>>>>>> back to appropriate values, or Xen should tolerate the channels >>>>>>>>>>> disappearing. >>>>>>>>> >>>>>>>>> I still stand by this statement. I believe it is the only safe way of >>>>>>>>> solving the issue you have discovered. >>>>>>>> >>>>>>>> Add a new qemu monitor command to update ioreq page? >>>>>>>> >>>>>>> >>>>>>> If you're attaching to a 'new' VM (i.e one with an updated image) then >> I >>>>>> suspect you're going to have to destroy and re-create the ioreq server >> so >>>>>> that the shared page gets re-populated with the correct event >> channels. >>>>>> Either that or you're going to have to ensure that the page is not part >>>>>> of >>>>>> restored image and sample the new one that Xen should have set up. >>>>>> >>>>>> >>>>>> I agree with it. I will try to add a new qemu monitor command(or do it >>>> when >>>>>> updating qemu's state) to destroy and re-create it. >>>>> >>>>> The slightly tricky part of that is that you're going to have to cache and >>>> replay all the registrations that were done on the old instance, but you >> need >>>> to do that in any case as it's not state that is transferred in the VM save >>>> record. >>>> >>>> Why do we have to cache and replay all the registrations that were done >> on >>>> the old instance? >>> >>> Do you not have device models that you need to continue to function? >> When the ioreq server is torn down then all MMIO, port IO and PCI config >> ranges that were mapped to it will disappear. >> >> Yes, I don't known which should be done unless I implement and test it. >> >> I have some questions about it: >> 1. Can guest access the ioreq page? If the page is modified by the guest >> unexpectedly, >> what will happen? > > No, the guest cannot modify the pages once a non-default ioreq server is > active. The pages are removed from the guest P2M when it is activated, which > is one of the reasons for modifying QEMU to not behave as a legacy default > server. > >> 2. If the ioreq page is dirtied by the guest, it will be transfered from >> primary >> to secondary during checkpoint. The evtchn is invalid, I think the best >> behavior >> is that: make the guest crashed, not continue to run. > > As I said, the pages are not in the P2M if the server is active so they will > not be transferred as part of the VM state. However, this presents a problem; > at the far end, the emulator will not be able to hook into the guest. So, > when the source domain is paused, the ioreq server needs to be torn down (so > that its pages are re-inserted into the P2M and marked dirty for transfer). > This is what happens in a normal migration. One extra problem you have is > that the source domain is not then killed, it is resumed along with the > emulator. Thus, on resume, the emulator needs to create a new ioreq server > and re-register all its device models with that new server. > I don't know the detail of what you do at the far end, but if you always > start a new emulator instance using the QEMU save record then you should be > fine (just like with a normal migration). I don't find the codes where the ioreq server is torn down when the source domain is paused. Which function? In our implementation, we don't start a new emulator. The codes can work, but some bugs may be not triggered. Thanks Wen Congyang > > Paul > >> >> Thanks >> Wen Congyang >> >>> >>> Paul >>> >>>> We will set to the guest to a new state, the old state should be dropped. >>>> >>>> Thanks >>>> Wen Congyang >>>> >>>>> >>>>> Paul >>>>> >>>>>> >>>>>> Thanks >>>>>> Wen Congyang >>>>>> >>>>>>> >>>>>>> Paul >>>>>>> >>>>>>> >>>>>>>> Thanks >>>>>>>> Wen Congyang >>>>>>>> >>>>>>>>> >>>>>>>>> ~Andrew >>>>>>>>> . >>>>>>>>> >>>>>>> >>>>>>> . >>>>>>> >>>>> >>>>> . >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@xxxxxxxxxxxxx >>>> http://lists.xen.org/xen-devel >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@xxxxxxxxxxxxx >>> http://lists.xen.org/xen-devel >>> . >>> > > . > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.