[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time



> -----Original Message-----
> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx]
> Sent: 11 June 2015 09:48
> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@xxxxxxxxxxxxx
> Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx;
> yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
> page only one time
> 
> On 06/11/2015 04:32 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx]
> >> Sent: 11 June 2015 02:14
> >> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
> devel@xxxxxxxxxxxxx
> >> Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx;
> >> yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson
> >> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> ioreq
> >> page only one time
> >>
> >> On 06/10/2015 07:47 PM, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel-
> >>>> bounces@xxxxxxxxxxxxx] On Behalf Of Wen Congyang
> >>>> Sent: 10 June 2015 12:38
> >>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
> >> devel@xxxxxxxxxxxxx
> >>>> Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx;
> >>>> yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson
> >>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero
> >> ioreq
> >>>> page only one time
> >>>>
> >>>> On 06/10/2015 06:58 PM, Paul Durrant wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx]
> >>>>>> Sent: 10 June 2015 11:55
> >>>>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-
> >>>> devel@xxxxxxxxxxxxx
> >>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong;
> >>>>>> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson
> >>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
> zero
> >>>> ioreq
> >>>>>> page only one time
> >>>>>>
> >>>>>> On 06/10/2015 06:40 PM, Paul Durrant wrote:
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx]
> >>>>>>>> Sent: 10 June 2015 10:06
> >>>>>>>> To: Andrew Cooper; Yang Hongyang; xen-devel@xxxxxxxxxxxxx;
> Paul
> >>>>>> Durrant
> >>>>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong;
> >>>>>>>> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson
> >>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore:
> >> zero
> >>>>>> ioreq
> >>>>>>>> page only one time
> >>>>>>>>
> >>>>>>>> Cc: Paul Durrant
> >>>>>>>>
> >>>>>>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
> >>>>>>>>> On 10/06/2015 06:26, Yang Hongyang wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
> >>>>>>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
> >>>>>>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
> >>>>>>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
> >>>>>>>>>>>>>>>> ioreq page contains evtchn which will be set when we
> >>>> resume
> >>>>>> the
> >>>>>>>>>>>>>>>> secondary vm the first time. The hypervisor will check if
> >> the
> >>>>>>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page
> >> more
> >>>>>>>>>>>>>>>> than one time.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after
> the
> >> vm
> >>>> is
> >>>>>>>>>>>>>>>> suspended, so it is OK if we only zero it one time.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Signed-off-by: Yang Hongyang
> <yanghy@xxxxxxxxxxxxxx>
> >>>>>>>>>>>>>>>> Signed-off-by: Wen congyang
> <wency@xxxxxxxxxxxxxx>
> >>>>>>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The issue here is that we are running the restore
> algorithm
> >>>> over
> >>>>>> a
> >>>>>>>>>>>>>>> domain which has already been running in Xen for a
> while.
> >>>> This
> >>>>>> is a
> >>>>>>>>>>>>>>> brand new usecase, as far as I am aware.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Exactly.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Does the qemu process associated with this domain get
> >>>> frozen
> >>>>>>>>>>>>>>> while the
> >>>>>>>>>>>>>>> secondary is being reset, or does the process get
> destroyed
> >>>> and
> >>>>>>>>>>>>>>> recreated.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What do you mean by reset? do you mean secondary is
> >>>>>> suspended
> >>>>>>>> at
> >>>>>>>>>>>>>> checkpoint?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Well - at the point that the buffered records are being
> >>>> processed,
> >>>>>> we
> >>>>>>>>>>>>> are in the process of resetting the state of the secondary to
> >>>> match
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> primary.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yes, at this point, the qemu process associated with this
> >> domain is
> >>>>>>>>>>>> frozen.
> >>>>>>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in
> >>>> qemu)
> >>>>>> to
> >>>>>>>>>>>> pause
> >>>>>>>>>>>> qemu. After we processed all records, qemu will be restored
> >> with
> >>>>>> the
> >>>>>>>>>>>> received
> >>>>>>>>>>>> state, that's why we add a
> >>>>>> libxl__qmp_restore(qemu_load_vmstate()
> >>>>>>>> in
> >>>>>>>>>>>> qemu)
> >>>>>>>>>>>> api to restore qemu with received state. Currently in libxl,
> >> qemu
> >>>> only
> >>>>>>>>>>>> start
> >>>>>>>>>>>> with the received state, there's no api to load received state
> >> while
> >>>>>>>>>>>> qemu is
> >>>>>>>>>>>> running for a while.
> >>>>>>>>>>>
> >>>>>>>>>>> Now I consider this more, it is absolutely wrong to not zero
> the
> >>>> page
> >>>>>>>>>>> here.  The event channel in the page is not guaranteed to be
> the
> >>>>>> same
> >>>>>>>>>>> between the primary and secondary,
> >>>>>>>>>>
> >>>>>>>>>> That's why we don't zero it on secondary.
> >>>>>>>>>
> >>>>>>>>> I think you missed my point.  Apologies for the double negative.
> It
> >>>>>>>>> must, under all circumstances, be zeroed at this point, for safety
> >>>>>> reasons.
> >>>>>>>>>
> >>>>>>>>> The page in question is subject to logdirty just like any other
> guest
> >>>>>>>>> pages, which means that if the guest writes to it naturally (i.e.
> not a
> >>>>>>>>> Xen or Qemu write, both of whom have magic mappings which
> are
> >>>> not
> >>>>>>>>> subject to logdirty), it will be transmitted in the stream.  As the
> >>>>>>>>> event channel could be different, the lack of zeroing it at this
> point
> >>>>>>>>> means that the event channel would be wrong as opposed to
> >> simply
> >>>>>>>>> missing.  This is a worse position to be in.
> >>>>>>>>
> >>>>>>>> The guest should not access this page. I am not sure if the guest
> can
> >>>>>>>> access the ioreq page.
> >>>>>>>>
> >>>>>>>> But in the exceptional case, the ioreq page is dirtied, and is copied
> to
> >>>>>>>> the secondary vm. The ioreq page will contain a wrong event
> >> channel,
> >>>> the
> >>>>>>>> hypervisor will check it: if the event channel is wrong, the guest
> will
> >>>>>>>> be crashed.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> and we don't want to unexpectedly
> >>>>>>>>>>> find a pending/in-flight ioreq.
> >>>>>>>>>>
> >>>>>>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is
> >>>> suspended,
> >>>>>>>> there
> >>>>>>>>>> should be no pending/in-flight ioreq at checkpoint.
> >>>>>>>>>
> >>>>>>>>> In the common case perhaps, but we must consider the
> >> exceptional
> >>>>>> case.
> >>>>>>>>> The exceptional case here is some corruption which happens to
> >>>> appear
> >>>>>> as
> >>>>>>>>> an in-flight ioreq.
> >>>>>>>>
> >>>>>>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If
> the
> >>>>>>>> hypervisor
> >>>>>>>> has a bug, anything can happen. I think we should trust the
> >> hypervisor.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Either qemu needs to take care of re-initialising the event
> >> channels
> >>>>>>>>>>> back to appropriate values, or Xen should tolerate the
> channels
> >>>>>>>>>>> disappearing.
> >>>>>>>>>
> >>>>>>>>> I still stand by this statement.  I believe it is the only safe way 
> >>>>>>>>> of
> >>>>>>>>> solving the issue you have discovered.
> >>>>>>>>
> >>>>>>>> Add a new qemu monitor command to update ioreq page?
> >>>>>>>>
> >>>>>>>
> >>>>>>> If you're attaching to a 'new' VM (i.e one with an updated image)
> then
> >> I
> >>>>>> suspect you're going to have to destroy and re-create the ioreq
> server
> >> so
> >>>>>> that the shared page gets re-populated with the correct event
> >> channels.
> >>>>>> Either that or you're going to have to ensure that the page is not part
> of
> >>>>>> restored image and sample the new one that Xen should have set
> up.
> >>>>>>
> >>>>>>
> >>>>>> I agree with it. I will try to add a new qemu monitor command(or do
> it
> >>>> when
> >>>>>> updating qemu's state) to destroy and re-create it.
> >>>>>
> >>>>> The slightly tricky part of that is that you're going to have to cache 
> >>>>> and
> >>>> replay all the registrations that were done on the old instance, but you
> >> need
> >>>> to do that in any case as it's not state that is transferred in the VM 
> >>>> save
> >>>> record.
> >>>>
> >>>> Why do we have to cache and replay all the registrations that were
> done
> >> on
> >>>> the old instance?
> >>>
> >>> Do you not have device models that you need to continue to function?
> >> When the ioreq server is torn down then all MMIO, port IO and PCI config
> >> ranges that were mapped to it will disappear.
> >>
> >> Yes, I don't known which should be done unless I implement and test it.
> >>
> >> I have some questions about it:
> >> 1. Can guest access the ioreq page? If the page is modified by the guest
> >> unexpectedly,
> >>    what will happen?
> >
> > No, the guest cannot modify the pages once a non-default ioreq server is
> active. The pages are removed from the guest P2M when it is activated,
> which is one of the reasons for modifying QEMU to not behave as a legacy
> default server.
> >
> >> 2. If the ioreq page is dirtied by the guest, it will be transfered from
> primary
> >>    to secondary during checkpoint. The evtchn is invalid, I think the best
> >> behavior
> >>    is that: make the guest crashed, not continue to run.
> >
> > As I said, the pages are not in the P2M if the server is active so they 
> > will not
> be transferred as part of the VM state. However, this presents a problem; at
> the far end, the emulator will not be able to hook into the guest. So, when
> the source domain is paused, the ioreq server needs to be torn down (so
> that its pages are re-inserted into the P2M and marked dirty for transfer).
> This is what happens in a normal migration. One extra problem you have is
> that the source domain is not then killed, it is resumed along with the
> emulator. Thus, on resume, the emulator needs to create a new ioreq server
> and re-register all its device models with that new server.
> > I don't know the detail of what you do at the far end, but if you always 
> > start
> a new emulator instance using the QEMU save record then you should be
> fine (just like with a normal migration).
> 
> I don't find the codes where the ioreq server is torn down when the source
> domain is paused. Which function?

Sorry, I overstated that. By 'torn down' I meant disabled. The function that 
does it is:

static void xen_hvm_change_state_handler(void *opaque, int running,
                                         RunState rstate)
{
    XenIOState *state = opaque;

    if (running) {
        xen_main_loop_prepare(state);
    }

    xen_set_ioreq_server_state(xen_xc, xen_domid,
                               state->ioservid,
                               (rstate == RUN_STATE_RUNNING));
}

> 
> In our implementation, we don't start a new emulator. The codes can work,
> but some bugs may be not triggered.
> 

How do you reconcile the incoming QEMU save record with the running emulator 
state?

  Paul

> Thanks
> Wen Congyang
> 
> >
> >   Paul
> >
> >>
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>>   Paul
> >>>
> >>>> We will set to the guest to a new state, the old state should be
> dropped.
> >>>>
> >>>> Thanks
> >>>> Wen Congyang
> >>>>
> >>>>>
> >>>>>   Paul
> >>>>>
> >>>>>>
> >>>>>> Thanks
> >>>>>> Wen Congyang
> >>>>>>
> >>>>>>>
> >>>>>>>   Paul
> >>>>>>>
> >>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Wen Congyang
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ~Andrew
> >>>>>>>>> .
> >>>>>>>>>
> >>>>>>>
> >>>>>>> .
> >>>>>>>
> >>>>>
> >>>>> .
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Xen-devel mailing list
> >>>> Xen-devel@xxxxxxxxxxxxx
> >>>> http://lists.xen.org/xen-devel
> >>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@xxxxxxxxxxxxx
> >>> http://lists.xen.org/xen-devel
> >>> .
> >>>
> >
> > .
> >


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.