[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Commit moratorium to staging



On 11/03/2017 06:35 PM, Juergen Gross wrote:
> On 03/11/17 19:29, Roger Pau Monné wrote:
>> On Fri, Nov 03, 2017 at 05:57:52PM +0000, George Dunlap wrote:
>>> On 11/03/2017 02:52 PM, George Dunlap wrote:
>>>> On 11/03/2017 02:14 PM, Roger Pau Monné wrote:
>>>>> On Thu, Nov 02, 2017 at 09:55:11AM +0000, Paul Durrant wrote:
>>>>>> Hmm. I wonder whether the guest is actually healthy after the migrate. 
>>>>>> One could imagine a situation where the storage device model (IDE in our 
>>>>>> case I guess) gets stuck in some way but recovers after a timeout in the 
>>>>>> guest storage stack. Thus, if you happen to try shut down while it is 
>>>>>> still stuck Windows starts trying to shut down but can't. Try after the 
>>>>>> timeout though and it can.
>>>>>> In the past we did make attempts to support Windows without PV drivers 
>>>>>> in XenServer but xenrt would never reliably pass VM lifecycle tests 
>>>>>> using emulated devices. That was with qemu trad, but I wonder whether 
>>>>>> upstream qemu is actually any better particularly if using older device 
>>>>>> models such as IDE and RTL8139 (which are probably largely unmodified 
>>>>>> from trad).
>>>>>
>>>>> Since I've been looking into this for a couple of days, and found no
>>>>> solution I'm going to write what I've found so far:
>>>>>
>>>>>  - The issue only affects Windows guests.
>>>>>  - It only manifests itself when doing live migration, non-live
>>>>>    migration or save/resume work fine.
>>>>>  - It affects all x86 hardware, the amount of migrations in order to
>>>>>    trigger it seems to depend on the hardware, but doing 20 migrations
>>>>>    reliably triggers it on all the hardware I've tested.
>>>>
>>>> Not good.
>>>>
>>>> You said that Windows reported that the login process failed somehow?
>>>>
>>>> Is it possible something bad is happening, like sending spurious page
>>>> faults to the guest in logdirty mode?
>>>>
>>>> I wonder if we could reproduce something like it on Linux -- set a build
>>>> going and start localhost migrating; a spurious page fault is likely to
>>>> cause the build to fail.
>>>
>>> Well, with a looping xen-build going on in the guest, I've done 40 local
>>> migrates with no problems yet.
>>>
>>> But Roger -- is this on emulated devices only, no PV drivers?
>>>
>>> That might be something worth looking at.
>>
>> Yes, windows doesn't have PV devices. But save/restore and non-live
>> migration seems fine, so it doesn't look to be related to devices, but
>> rather to log-dirty or some other aspect of live-migration.
> 
> log-dirty for read-I/Os of emulated devices?

FWIW I booted a Linux guest with "xen_nopv" on the command-line, gave it
256 MiB of RAM, and then ran a Xen build on it in a loop (see command
below).

Then I started migrating it in a loop.

After an hour or two it had done 146 local migrations, and 46 builds of
Xen (swapping onto emulated disk is pretty slow), without any issues.

Build command:

# while make -j 3 xen ; do git clean -ffdx ; done

I'm shutting down the VM and I'll leave it running overnight.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.