[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 00 of 10] Teach xm save to checkpoint a running domain



>I'm not too sure about the last couple of patches in this
>series. Because the checkpointing domain doesn't disconnect before
>calling suspend, it retains a few references to pages it doesn't
>own. These trigger a PT race detector in xc_linux_save, which causes
>it to abort. So the last couple of patches explicitly identify the
>references I've found so far (shared_info and some grant table shared
>pages) and simply zero those PTEs during save, since they'll be
>recreated on restore. Finding the grant table pages is a bit fragile -
>I walk the page table loaded in CR3 at the time of suspend looking for
>the virtual address I've stowed in the suspend record. I've only got
>code for two-level page tables at the moment, since I'm not convinced
>this is the right approach. Under what circumstances would a non-live
>save have an unsafe PTE race? 

Pretty much any PT race in a non-live save/migrate is a bug; the 
domain is (in theory) suspended at this point, and all of the 
devices are disconnected. Since you've chosen not to 'disconnect' 
the devices, you'll get random updates occuring to any shared 
pages (shared via grants or directly shared with Xen). 

> Maybe it's fine to simply zero these ptes without checking them. 

I'd think not. 

>Or maybe it'd be less fragile to get the owners of the pages from Xen 
>and see if the guest has legitimate mappings to them? Comments?

I think the ideal thing to do here is to mirror the live migrate case, 
i.e. do a full 'disconnect' of devices, xenbus, console, event channels
etc, and then bring them back up. It'll probably be possible to do this
in a slightly more efficient / less intrusive fashion by just cauterising
things in Xen (i.e. closing the event channel -> guest path but not 
unbinding the interdomain side). For grants, you basically have to 
follow the live migrate case and be prepared to re-issue, since otherwise
on resume (which is preumably desired at some point?) you'll have garbage
in flight and/or lost requests. 


Anyway, looks like an interesting start, and would be a nice feature 
to get into -unstable sometime post 3.0.4. 



cheers,

S.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.