[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time



On 10/06/15 10:06, Wen Congyang wrote:
> Cc: Paul Durrant
>
> On 06/10/2015 03:44 PM, Andrew Cooper wrote:
>> On 10/06/2015 06:26, Yang Hongyang wrote:
>>>
>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
>>>> On 09/06/2015 01:59, Yang Hongyang wrote:
>>>>>
>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
>>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
>>>>>>>
>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
>>>>>>>>> ioreq page contains evtchn which will be set when we resume the
>>>>>>>>> secondary vm the first time. The hypervisor will check if the
>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
>>>>>>>>> than one time.
>>>>>>>>>
>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
>>>>>>>>> suspended, so it is OK if we only zero it one time.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
>>>>>>>>> Signed-off-by: Wen congyang <wency@xxxxxxxxxxxxxx>
>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>>>>>>>> The issue here is that we are running the restore algorithm over a
>>>>>>>> domain which has already been running in Xen for a while.  This is a
>>>>>>>> brand new usecase, as far as I am aware.
>>>>>>> Exactly.
>>>>>>>
>>>>>>>> Does the qemu process associated with this domain get frozen
>>>>>>>> while the
>>>>>>>> secondary is being reset, or does the process get destroyed and
>>>>>>>> recreated.
>>>>>>> What do you mean by reset? do you mean secondary is suspended at
>>>>>>> checkpoint?
>>>>>> Well - at the point that the buffered records are being processed, we
>>>>>> are in the process of resetting the state of the secondary to match
>>>>>> the
>>>>>> primary.
>>>>> Yes, at this point, the qemu process associated with this domain is
>>>>> frozen.
>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to
>>>>> pause
>>>>> qemu. After we processed all records, qemu will be restored with the
>>>>> received
>>>>> state, that's why we add a libxl__qmp_restore(qemu_load_vmstate() in
>>>>> qemu)
>>>>> api to restore qemu with received state. Currently in libxl, qemu only
>>>>> start
>>>>> with the received state, there's no api to load received state while
>>>>> qemu is
>>>>> running for a while.
>>>> Now I consider this more, it is absolutely wrong to not zero the page
>>>> here.  The event channel in the page is not guaranteed to be the same
>>>> between the primary and secondary,
>>> That's why we don't zero it on secondary.
>> I think you missed my point.  Apologies for the double negative.   It
>> must, under all circumstances, be zeroed at this point, for safety reasons.
>>
>> The page in question is subject to logdirty just like any other guest
>> pages, which means that if the guest writes to it naturally (i.e. not a
>> Xen or Qemu write, both of whom have magic mappings which are not
>> subject to logdirty), it will be transmitted in the stream.  As the
>> event channel could be different, the lack of zeroing it at this point
>> means that the event channel would be wrong as opposed to simply
>> missing.  This is a worse position to be in.
> The guest should not access this page. I am not sure if the guest can
> access the ioreq page.

"should not" and "can't" are two very different things.  We have had
XSAs covering the fact that the guest can write to these pages in the past.

In practice, a guest can't actually query the appropriate hvmparam, but
it can rely on the fact that the domain builder is incredibly
predictable in this regard.

>
> But in the exceptional case, the ioreq page is dirtied, and is copied to
> the secondary vm. The ioreq page will contain a wrong event channel, the
> hypervisor will check it: if the event channel is wrong, the guest will
> be crashed.

This is my point.  It is completely legitimate for the event channels to
be different between the primary and secondary, which means that we
should be capable of dealing cleanly with the fallout when the bufioreq
page does appear as dirty update.

>
>>>> and we don't want to unexpectedly
>>>> find a pending/in-flight ioreq.
>>> ioreq->state is always STATE_IOREQ_NONE after the vm is suspended, there
>>> should be no pending/in-flight ioreq at checkpoint.
>> In the common case perhaps, but we must consider the exceptional case. 
>> The exceptional case here is some corruption which happens to appear as
>> an in-flight ioreq.
> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the 
> hypervisor
> has a bug, anything can happen. I think we should trust the hypervisor.

In the worst case, the contents of the pages can be completely
arbitrary.  Zeroing of the pages is to cover the case where there is
junk present, so Xen doesn't crash the guest due to a bad ioreq state.

I think Xen's behaviour is legitimate here.  If it observes wonky ioreq
state, all bets are off.

>
>>>> Either qemu needs to take care of re-initialising the event channels
>>>> back to appropriate values, or Xen should tolerate the channels
>>>> disappearing.
>> I still stand by this statement.  I believe it is the only safe way of
>> solving the issue you have discovered.
> Add a new qemu monitor command to update ioreq page?

Who/what actually complains about the event channel?  I can't see any
event channels in the ABI for the pages.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.