Xen project Mailing List

Re: [Xen-devel] DomU crash during migration when suspending source domain

To: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>, "Graham, Simon" <Simon.Graham@xxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>

From: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>

Date: Wed, 14 Feb 2007 10:48:55 +0000

Delivery-date: Wed, 14 Feb 2007 02:48:18 -0800

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AcdP6h4+HveIAzruQ3+gt7NQNapEGwANqzaeAADJUVAAAHIl2w==

Thread-topic: [Xen-devel] DomU crash during migration when suspending source domain

Are you migrating between unlike boxes? My guess is that the original box has processors supporting cacheinfo cpuid leaves and the target box does not. Migrating to older less-capable CPUs is definitely hit-and-miss I'm afraid. It really is best not to do it! -- Keir On 14/2/07 10:36, "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx> wrote: > Your theory that the cpu_down() is happening too early sounds plausible > except that cpu_up/cpu_down are both entirely protected by the hotplug lock. > See their definitions in kernel/cpu.c. > > The notifier calls of interest are CPU_ONLINE and CPU_DEAD. These are the > events that the cacheinfo code cares about. You can see that both > notifications are broadcast under the cpu_hotplug_lock, so there should be > no race possible in which a CPU starts to be taken down before all > notification work associated with it coming online has completed. > > -- Keir > > On 14/2/07 10:13, "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx> wrote: > >> Is this with a 2.6.16 guest from 3.0.4? This would most likely be a CPU >> hotplug issue in Linux, but we did so lots of testing of that... >> >> -- Keir >> >> On 14/2/07 03:42, "Graham, Simon" <Simon.Graham@xxxxxxxxxxx> wrote: >> >>> Just run into an odd DomU crash doing live migration of a 4-VCPU domain >>> (with >>> 3.0.4 but the code looks the same in 2.6.18/unstable to me) - the actual >>> panic >>> is attached at the end of this, but the bottom line is that the code in >>> cache_remove_shared_cpu_map (in arch/i385/kernel/cpu/intel_cacheinfo.c) is >>> attempting to clean up the cache info for a processor that does not yet have >>> this info setup - the code is dereferencing a pointer in the cpuid4_info[] >>> array and looking at the dump I can see that this is NULL. >>> >>> My working theory here is that we attempted the migration waaay early and >>> the >>> initialization of the array of cache info pointers was not setup for all >>> processors yet; it would be relatively easy to protect against this by >>> checking for NULL, but I'm not sure if this is the correct solution or not >>> -- >>> if anyone is familiar with this code and can comment on an appropriate fix >>> I'd >>> be grateful. >>> >>> One thing I'm really not sure about is the timing of marking the CPUs up >>> with >>> respect to the trace re initializing CPUs (see console output below) -- I >>> can >>> see that the four VCPUs are setup in the cpu_sys_devices array (which is >>> setup >>> by the code that outputs the 'Initializing CPU#n' trace) but the array of >>> cache info structures only has an entry for VCPU 0: >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxxxxxxxx >> http://lists.xensource.com/xen-devel > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.