Xen project Mailing List

Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time

To: Wen Congyang <wency@xxxxxxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Yang Hongyang <yanghy@xxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>

Date: Thu, 11 Jun 2015 10:20:00 +0000

Accept-language: en-GB, en-US

Cc: Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, "guijianfeng@xxxxxxxxxxxxxx" <guijianfeng@xxxxxxxxxxxxxx>, "yunhong.jiang@xxxxxxxxx" <yunhong.jiang@xxxxxxxxx>, Eddie Dong <eddie.dong@xxxxxxxxx>, "rshriram@xxxxxxxxx" <rshriram@xxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxx>

Delivery-date: Thu, 11 Jun 2015 10:20:06 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHQodA50QXfGUHZAUa1cH58uzZqCJ2iPkSAgAAEsICAAPcbgIAAbRGAgAFvmICAACa+gIAAFsaAgAA7GVD//+NLAIAAIfVQ///p/gCAACP20IAAwAGAgACZTeD//+XGAAAHUOkw

Thread-topic: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time

> -----Original Message----- > From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx] > Sent: 11 June 2015 09:48 > To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@xxxxxxxxxxxxx > Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx; > yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson > Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq > page only one time > > On 06/11/2015 04:32 PM, Paul Durrant wrote: > >> -----Original Message----- > >> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx] > >> Sent: 11 June 2015 02:14 > >> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen- > devel@xxxxxxxxxxxxx > >> Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx; > >> yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson > >> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero > ioreq > >> page only one time > >> > >> On 06/10/2015 07:47 PM, Paul Durrant wrote: > >>>> -----Original Message----- > >>>> From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel- > >>>> bounces@xxxxxxxxxxxxx] On Behalf Of Wen Congyang > >>>> Sent: 10 June 2015 12:38 > >>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen- > >> devel@xxxxxxxxxxxxx > >>>> Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx; > >>>> yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson > >>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero > >> ioreq > >>>> page only one time > >>>> > >>>> On 06/10/2015 06:58 PM, Paul Durrant wrote: > >>>>>> -----Original Message----- > >>>>>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx] > >>>>>> Sent: 10 June 2015 11:55 > >>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen- > >>>> devel@xxxxxxxxxxxxx > >>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong; > >>>>>> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson > >>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: > zero > >>>> ioreq > >>>>>> page only one time > >>>>>> > >>>>>> On 06/10/2015 06:40 PM, Paul Durrant wrote: > >>>>>>>> -----Original Message----- > >>>>>>>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx] > >>>>>>>> Sent: 10 June 2015 10:06 > >>>>>>>> To: Andrew Cooper; Yang Hongyang; xen-devel@xxxxxxxxxxxxx; > Paul > >>>>>> Durrant > >>>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong; > >>>>>>>> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson > >>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: > >> zero > >>>>>> ioreq > >>>>>>>> page only one time > >>>>>>>> > >>>>>>>> Cc: Paul Durrant > >>>>>>>> > >>>>>>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote: > >>>>>>>>> On 10/06/2015 06:26, Yang Hongyang wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote: > >>>>>>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote: > >>>>>>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote: > >>>>>>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote: > >>>>>>>>>>>>>>>> ioreq page contains evtchn which will be set when we > >>>> resume > >>>>>> the > >>>>>>>>>>>>>>>> secondary vm the first time. The hypervisor will check if > >> the > >>>>>>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page > >> more > >>>>>>>>>>>>>>>> than one time. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after > the > >> vm > >>>> is > >>>>>>>>>>>>>>>> suspended, so it is OK if we only zero it one time. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Signed-off-by: Yang Hongyang > <yanghy@xxxxxxxxxxxxxx> > >>>>>>>>>>>>>>>> Signed-off-by: Wen congyang > <wency@xxxxxxxxxxxxxx> > >>>>>>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The issue here is that we are running the restore > algorithm > >>>> over > >>>>>> a > >>>>>>>>>>>>>>> domain which has already been running in Xen for a > while. > >>>> This > >>>>>> is a > >>>>>>>>>>>>>>> brand new usecase, as far as I am aware. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Exactly. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Does the qemu process associated with this domain get > >>>> frozen > >>>>>>>>>>>>>>> while the > >>>>>>>>>>>>>>> secondary is being reset, or does the process get > destroyed > >>>> and > >>>>>>>>>>>>>>> recreated. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> What do you mean by reset? do you mean secondary is > >>>>>> suspended > >>>>>>>> at > >>>>>>>>>>>>>> checkpoint? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Well - at the point that the buffered records are being > >>>> processed, > >>>>>> we > >>>>>>>>>>>>> are in the process of resetting the state of the secondary to > >>>> match > >>>>>>>>>>>>> the > >>>>>>>>>>>>> primary. > >>>>>>>>>>>> > >>>>>>>>>>>> Yes, at this point, the qemu process associated with this > >> domain is > >>>>>>>>>>>> frozen. > >>>>>>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in > >>>> qemu) > >>>>>> to > >>>>>>>>>>>> pause > >>>>>>>>>>>> qemu. After we processed all records, qemu will be restored > >> with > >>>>>> the > >>>>>>>>>>>> received > >>>>>>>>>>>> state, that's why we add a > >>>>>> libxl__qmp_restore(qemu_load_vmstate() > >>>>>>>> in > >>>>>>>>>>>> qemu) > >>>>>>>>>>>> api to restore qemu with received state. Currently in libxl, > >> qemu > >>>> only > >>>>>>>>>>>> start > >>>>>>>>>>>> with the received state, there's no api to load received state > >> while > >>>>>>>>>>>> qemu is > >>>>>>>>>>>> running for a while. > >>>>>>>>>>> > >>>>>>>>>>> Now I consider this more, it is absolutely wrong to not zero > the > >>>> page > >>>>>>>>>>> here. The event channel in the page is not guaranteed to be > the > >>>>>> same > >>>>>>>>>>> between the primary and secondary, > >>>>>>>>>> > >>>>>>>>>> That's why we don't zero it on secondary. > >>>>>>>>> > >>>>>>>>> I think you missed my point. Apologies for the double negative. > It > >>>>>>>>> must, under all circumstances, be zeroed at this point, for safety > >>>>>> reasons. > >>>>>>>>> > >>>>>>>>> The page in question is subject to logdirty just like any other > guest > >>>>>>>>> pages, which means that if the guest writes to it naturally (i.e. > not a > >>>>>>>>> Xen or Qemu write, both of whom have magic mappings which > are > >>>> not > >>>>>>>>> subject to logdirty), it will be transmitted in the stream. As the > >>>>>>>>> event channel could be different, the lack of zeroing it at this > point > >>>>>>>>> means that the event channel would be wrong as opposed to > >> simply > >>>>>>>>> missing. This is a worse position to be in. > >>>>>>>> > >>>>>>>> The guest should not access this page. I am not sure if the guest > can > >>>>>>>> access the ioreq page. > >>>>>>>> > >>>>>>>> But in the exceptional case, the ioreq page is dirtied, and is copied > to > >>>>>>>> the secondary vm. The ioreq page will contain a wrong event > >> channel, > >>>> the > >>>>>>>> hypervisor will check it: if the event channel is wrong, the guest > will > >>>>>>>> be crashed. > >>>>>>>> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> and we don't want to unexpectedly > >>>>>>>>>>> find a pending/in-flight ioreq. > >>>>>>>>>> > >>>>>>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is > >>>> suspended, > >>>>>>>> there > >>>>>>>>>> should be no pending/in-flight ioreq at checkpoint. > >>>>>>>>> > >>>>>>>>> In the common case perhaps, but we must consider the > >> exceptional > >>>>>> case. > >>>>>>>>> The exceptional case here is some corruption which happens to > >>>> appear > >>>>>> as > >>>>>>>>> an in-flight ioreq. > >>>>>>>> > >>>>>>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If > the > >>>>>>>> hypervisor > >>>>>>>> has a bug, anything can happen. I think we should trust the > >> hypervisor. > >>>>>>>> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Either qemu needs to take care of re-initialising the event > >> channels > >>>>>>>>>>> back to appropriate values, or Xen should tolerate the > channels > >>>>>>>>>>> disappearing. > >>>>>>>>> > >>>>>>>>> I still stand by this statement. I believe it is the only safe way > >>>>>>>>> of > >>>>>>>>> solving the issue you have discovered. > >>>>>>>> > >>>>>>>> Add a new qemu monitor command to update ioreq page? > >>>>>>>> > >>>>>>> > >>>>>>> If you're attaching to a 'new' VM (i.e one with an updated image) > then > >> I > >>>>>> suspect you're going to have to destroy and re-create the ioreq > server > >> so > >>>>>> that the shared page gets re-populated with the correct event > >> channels. > >>>>>> Either that or you're going to have to ensure that the page is not part > of > >>>>>> restored image and sample the new one that Xen should have set > up. > >>>>>> > >>>>>> > >>>>>> I agree with it. I will try to add a new qemu monitor command(or do > it > >>>> when > >>>>>> updating qemu's state) to destroy and re-create it. > >>>>> > >>>>> The slightly tricky part of that is that you're going to have to cache > >>>>> and > >>>> replay all the registrations that were done on the old instance, but you > >> need > >>>> to do that in any case as it's not state that is transferred in the VM > >>>> save > >>>> record. > >>>> > >>>> Why do we have to cache and replay all the registrations that were > done > >> on > >>>> the old instance? > >>> > >>> Do you not have device models that you need to continue to function? > >> When the ioreq server is torn down then all MMIO, port IO and PCI config > >> ranges that were mapped to it will disappear. > >> > >> Yes, I don't known which should be done unless I implement and test it. > >> > >> I have some questions about it: > >> 1. Can guest access the ioreq page? If the page is modified by the guest > >> unexpectedly, > >> what will happen? > > > > No, the guest cannot modify the pages once a non-default ioreq server is > active. The pages are removed from the guest P2M when it is activated, > which is one of the reasons for modifying QEMU to not behave as a legacy > default server. > > > >> 2. If the ioreq page is dirtied by the guest, it will be transfered from > primary > >> to secondary during checkpoint. The evtchn is invalid, I think the best > >> behavior > >> is that: make the guest crashed, not continue to run. > > > > As I said, the pages are not in the P2M if the server is active so they > > will not > be transferred as part of the VM state. However, this presents a problem; at > the far end, the emulator will not be able to hook into the guest. So, when > the source domain is paused, the ioreq server needs to be torn down (so > that its pages are re-inserted into the P2M and marked dirty for transfer). > This is what happens in a normal migration. One extra problem you have is > that the source domain is not then killed, it is resumed along with the > emulator. Thus, on resume, the emulator needs to create a new ioreq server > and re-register all its device models with that new server. > > I don't know the detail of what you do at the far end, but if you always > > start > a new emulator instance using the QEMU save record then you should be > fine (just like with a normal migration). > > I don't find the codes where the ioreq server is torn down when the source > domain is paused. Which function? Sorry, I overstated that. By 'torn down' I meant disabled. The function that does it is: static void xen_hvm_change_state_handler(void *opaque, int running, RunState rstate) { XenIOState *state = opaque; if (running) { xen_main_loop_prepare(state); } xen_set_ioreq_server_state(xen_xc, xen_domid, state->ioservid, (rstate == RUN_STATE_RUNNING)); } > > In our implementation, we don't start a new emulator. The codes can work, > but some bugs may be not triggered. > How do you reconcile the incoming QEMU save record with the running emulator state? Paul > Thanks > Wen Congyang > > > > > Paul > > > >> > >> Thanks > >> Wen Congyang > >> > >>> > >>> Paul > >>> > >>>> We will set to the guest to a new state, the old state should be > dropped. > >>>> > >>>> Thanks > >>>> Wen Congyang > >>>> > >>>>> > >>>>> Paul > >>>>> > >>>>>> > >>>>>> Thanks > >>>>>> Wen Congyang > >>>>>> > >>>>>>> > >>>>>>> Paul > >>>>>>> > >>>>>>> > >>>>>>>> Thanks > >>>>>>>> Wen Congyang > >>>>>>>> > >>>>>>>>> > >>>>>>>>> ~Andrew > >>>>>>>>> . > >>>>>>>>> > >>>>>>> > >>>>>>> . > >>>>>>> > >>>>> > >>>>> . > >>>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Xen-devel mailing list > >>>> Xen-devel@xxxxxxxxxxxxx > >>>> http://lists.xen.org/xen-devel > >>> > >>> _______________________________________________ > >>> Xen-devel mailing list > >>> Xen-devel@xxxxxxxxxxxxx > >>> http://lists.xen.org/xen-devel > >>> . > >>> > > > > . > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.