[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Prepping a bugfix push
On Friday, 04 December 2009 at 08:37, Jeremy Fitzhardinge wrote: > On 12/04/09 07:50, Ian Campbell wrote: > >On Fri, 2009-12-04 at 07:46 +0000, Ian Campbell wrote: > >>I've been doing regular suspend/resumes not checkpoint ones as Brendan > >>is doing, I did try a couple of checkpointed ones yesterday and they > >>failed, IIRC with a similar softlockup to this one. > >So what is happening is that the device event channels are getting torn > >down by the resume handler and never completely reinstated in the > >cancelled suspend (aka checkpoint) case. > > Hm. > > >In 2.6.18 there was a separate ->suspend_cancel() callback for each > >driver, called instead of the ->resume() callback in exactly these > >circumstances. The cancel callback doesn't do any of the teardown, in > >fact for blkfront it doesn't even exist. > > > >(As a proof of concept, commenting out the entire contents of > >blkfront_resume and netfront_resume makes checkpointing work OK for me, > >at the cost of breaking regular resume, of course) > > > >pv-ops uses the generic power management infrastructure which does not > >have a concept of cancelling a suspend. Perhaps it should? Otherwise a > >different solution will be required, I'm not sure what that might be yet > >yet. > > Well, the obvious one is to treat it as a full suspend followed by > immediate resume. That is, just remove all the special case handling > for checkpoint, and let it do the normal resume stuff when the > hypercall returns. > > I think the PM core can fail to suspend; it just resumes anything > that has been suspended so far. Hmm. I just tried changing the SUSPEND_CANCEL elfnote to 0 in pvops, and now save -c takes a very long time. From the xend log: [2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:3025) XendDomainInfo.resumeDomain(19) [2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2319) Destroying device model [2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2326) Releasing devices [2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2332) Removing vif/0 [2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:1213) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0 [2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2332) Removing vbd/51713 [2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:1213) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51713 [2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2332) Removing console/0 [2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:1213) XendDomainInfo.destroyDevice: deviceClass = console, device = console/0 [2009-12-04 08:57:58 4917] INFO (XendDomainInfo:3260) Dev 51713 still active, looping... that last line repeats for a very long time, and eventually gives up. The domain is still broken when save completes. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |