[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] DomU crash during migration when suspending source domain
Your theory that the cpu_down() is happening too early sounds plausible except that cpu_up/cpu_down are both entirely protected by the hotplug lock. See their definitions in kernel/cpu.c. The notifier calls of interest are CPU_ONLINE and CPU_DEAD. These are the events that the cacheinfo code cares about. You can see that both notifications are broadcast under the cpu_hotplug_lock, so there should be no race possible in which a CPU starts to be taken down before all notification work associated with it coming online has completed. -- Keir On 14/2/07 10:13, "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx> wrote: > Is this with a 2.6.16 guest from 3.0.4? This would most likely be a CPU > hotplug issue in Linux, but we did so lots of testing of that... > > -- Keir > > On 14/2/07 03:42, "Graham, Simon" <Simon.Graham@xxxxxxxxxxx> wrote: > >> Just run into an odd DomU crash doing live migration of a 4-VCPU domain (with >> 3.0.4 but the code looks the same in 2.6.18/unstable to me) - the actual >> panic >> is attached at the end of this, but the bottom line is that the code in >> cache_remove_shared_cpu_map (in arch/i385/kernel/cpu/intel_cacheinfo.c) is >> attempting to clean up the cache info for a processor that does not yet have >> this info setup - the code is dereferencing a pointer in the cpuid4_info[] >> array and looking at the dump I can see that this is NULL. >> >> My working theory here is that we attempted the migration waaay early and the >> initialization of the array of cache info pointers was not setup for all >> processors yet; it would be relatively easy to protect against this by >> checking for NULL, but I'm not sure if this is the correct solution or not -- >> if anyone is familiar with this code and can comment on an appropriate fix >> I'd >> be grateful. >> >> One thing I'm really not sure about is the timing of marking the CPUs up with >> respect to the trace re initializing CPUs (see console output below) -- I can >> see that the four VCPUs are setup in the cpu_sys_devices array (which is >> setup >> by the code that outputs the 'Initializing CPU#n' trace) but the array of >> cache info structures only has an entry for VCPU 0: > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |