[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Hi folks, long time no see. :-) On Tuesday 01 February 2011 17:32:25 Andre Przywara wrote: > I asked Stephan Diestelhorst for help and after I convinced him that > removing credit and making SEDF the default again is not an option he > worked together with me on that ;-) Many thanks for that! > We haven't come to a final solution but could gather some debug data. > I will simply dump some data here, maybe somebody has got a clue. We > will work further on this tomorrow. Andre and I have been looking through this further, in particular sanity checking the invariant prv->weight >= sdom->weight * sdom->active_vcpu_count each time someone tweaks the active vcpu count. This happens only in __csched_vcpu_acct_start and __csched_vcpu_acct_stop_locked. We managed to observe the broken invariant when splitting cpupoools. We have the following theory of what happens: * some vcpus of a particular domain are currently in the process of being moved to the new pool * some are still left on the old pool (vcpus_old) and some are already in the new pool (vcpus_new) * we now have vcpus_old->sdom = vcpus_new->sdom and following from this * vcpus_old->sdom->weight = vcpus_new->sdom->weight * vcpus_old->sdom->active_vcpu_count = vcpus_new->sdom->active_vcpu_count * active_vcpu_count thus does not represent the separation of the actual vpcus (may be the sum, only the old or new ones, does not matter) * however, sched_old != sched_new, and thus * sched_old->prv != sched_new->prv * sched_old->prv->weight != sched_new->prv->weight * the prv->weight field hence sees the incremental move of VCPUs (through modifications in *acct_start and *acct_stop_locked) * if at any point in this half-way migration, the scheduler wants to csched_acct, it erroneously checks the wrong active_vcpu_count Workarounds / fixes (none tried): * disable scheduler accounting while half-way migrating a domain (dom->pool_migrating flag and then checking in csched_acct) * temporarily split the sdom structures while migrating to account for transient split of vcpus * synchronously disable all vcpus, migrate and then re-enable Caveats: * prv->lock does not guarantee mutual exclusion between (same) schedulers of different pools <rant> The general locking policy vs the comment situation is a nightmare. I know that we have some advanced data-structure folks here, but intuitively reasoning about when specific things are atomic and mutually excluded is a pain in the scheduler / cpupool code, see the issue with the separate prv->locks above. E.g. cpupool_unassign_cpu and cpupool_unassign_cpu_helper interplay: * cpupool_unassign_cpu unlocks cpupool_lock * sets up the continuation calling cpupool_unassign_cpu_helper * cpupool_unassign_cpu_helper locks cpupool_lock * while intuitively, one would think that both should see a consistent snapshot and hence freeing the lock in the middle is a bad idea * also communicating continuation-local state through global variables mandates that only a single global continuation can be pending * reading cpu outside of the lock protection in cpupool_unassign_cpu_helper also smells </rant> Despite the rant, it is amazing to see the ability to move running things around through this remote continuation trick! In my (ancient) balancer experiments I added hypervisor-threads just for side- stepping this issue.. Stephan -- Stephan Diestelhorst, AMD Operating System Research Center stephan.diestelhorst@xxxxxxx Tel. +49 (0)351 448 356 719 Advanced Micro Devices GmbH Einsteinring 24 85609 Aschheim Germany Geschaeftsfuehrer: Alberto Bozzo u. Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632, WEEE-Reg-Nr: DE 12919551 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |