[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] crash in csched_load_balance after xl vcpu-pin
> On Apr 13, 2018, at 10:25 AM, Dario Faggioli <dfaggioli@xxxxxxxx> wrote: > > On Fri, 2018-04-13 at 09:03 +0000, George Dunlap wrote: >>> On Apr 12, 2018, at 6:25 PM, Dario Faggioli <dfaggioli@xxxxxxxx> >>> wrote: >>> >> I think the bottom line is, for this test to be valid, then at this >> point test_bit(VPF_migrating) *must* imply !vcpu_on_runqueue(v), but >> at this point it doesn’t: If someone else has come by and cleared the >> bit, done migration, and woken it up, and then someone *else* set the >> bit again without taking it off the runqueue, it may still be on the >> runqueue. >> >> My series which calls vcpu_sleep_nosync_locked() after setting >> VPF_migrating should help with this. >> > Yes. In fact, Olaf, I still think that doing a run with George's RFC > applied, would be useful, if only as a data point. > >> Or, alternately, instead of baking all this implicit knowledge about >> credit into the scheduler, we should just implement >> credit_vcpu_migrate(), and have it remove it from one runqueue and >> put it on another. >> > But it's not really "baking Credit implicit knowledge", IMO. It is that > we have an invariant which we are failing to enforce. Which invariant is that? That a vcpu is not on a runqueue when switching v->processor. But “on a runqueue” is a scheduler-specific construct that the main scheduling code doesn’t know about. Otherwise we could make the late bail-out clause in vcpu_migrate() something like this: if ( !v->is_running || vcpu_on_runq(v) || !test_and_clear_bit(VPF_migrating) ) { /* unlock and return */ } All this stuff with vcpu_sleep_nosync() and vcpu_wake() is just indirectly making sure that the Credit1-specific invariant — that switching v->processor removes it from one runqueue and adds it to another — actually happens; but it does it in an opaque way. And the main reason the migrate() callback was introduced (IIRC) is because credit2’s migration invariants didn’t really correspond to the invariants implicitly defined by schedule.c for credit1. I think as far as backports go, my current RFC would be fine. Another possibility, though, would be to simply add a migrate() callback to remove the vcpu from the runqueue before switching v->processor, *without* removing any of the current song and dance about vcpu_sleep_nosync(). That should be fairly simple and straightforward to backport, and won’t make anything worse (since in theory it should have been removed by that point anyway). Then for 4.12 we can figure out what we want to do going forward. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |