[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [XenPPC] Overflow in decrementer restore
Just to provide background for this commit that went in today: --- a/xen/arch/powerpc/powerpc64/domain.c +++ b/xen/arch/powerpc/powerpc64/domain.c @@ -55,7 +55,10 @@ void load_sprs(struct vcpu *v) /* adjust the DEC value to account for cycles while not * running this OS */ timebase_delta = mftb() - v->arch.timebase; - v->arch.dec -= timebase_delta; + if (timebase_delta > v->arch.dec) + v->arch.dec = 0; + else + v->arch.dec -= timebase_delta; } In the patch titled "Schedule idle domain on secondary processors", I mentioned that sometimes the entire system would freeze, so I didn't want the patch to be considered for merging. The problem turned out to be that we don't sync the timebases between the processors. So if load_sprs() is executed on a different CPU than save_sprs() was, the call to mftb is bogus. The timebase_delta can overflow into a large unsigned value of up to 149 seconds on JS21. So the domU was not wrecking the machine, the decrementer was just being loaded with a huge value every time that domU's vcpu was loaded on a particular physical CPU, including cpu0. This patch also went in, to pin dom0 to cpu0: --- a/xen/arch/powerpc/setup.c Fri Sep 01 12:31:56 2006 -0400 +++ b/xen/arch/powerpc/setup.c Fri Sep 01 12:37:29 2006 -0400 @@ -343,6 +343,10 @@ static void __init __start_xen(multiboot if (NULL == alloc_vcpu(dom0, 0, 0)) panic("Error creating domain 0 vcpu 0\n"); + /* The Interrupt Controller will route everything to CPU 0 so we + * need to make sure Dom0's vVCPU 0 is pinned to the CPU */ + dom0->vcpu[0]->cpu_affinity = cpumask_of_cpu(0); + We are currently thinking about how best to sync the timebases. Right now it looks like pulling in Linux's implementation is the best option. Any comments would be appreciated. We did have a real memory controller hang, as discussed on this list in response to my original post. It only occurred on Maple, where PIBS does not clear the HIOR for secondary CPUSs, so their first exeception was delivered to 0xX00 + Y. Hence this patch that went in yesterday: + cpu0_hior = 0; + mthior(cpu0_hior); _______________________________________________ Xen-ppc-devel mailing list Xen-ppc-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ppc-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |