[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PV Shim ballooning



On Tue, Feb 11, 2020 at 01:39:42PM +0000, Andrew Cooper wrote:
> Ballooning inside PV shim is currently very broken.
> 
> From an instrumented Xen and 32bit PV XTF test:
> 
> (d3) (d3) --- Xen Test Framework ---
> (d3) (d3) Ballooning: PV 32bit (PAE 3 levels)
> (d3) (d3) mr { 0010a940, 1024, 0x7ff0 }
> (d3) (d3) About to decrease
> (d3) (XEN) *** D { ffff820080000020, nr 1020, done 0 }
> (d3) (XEN) d3v0 failed to reserve 267 extents of order 0 for offlining
> (d3) (XEN) *** D { ffff82007fffe040, nr 1024, done 1020 }
> (d3) (XEN) d3v0 failed to reserve 1024 extents of order 0 for offlining
> (d3) (d3) => got 1024
> 
> This test takes 1024 frames and calls decrease reservation on them,
> before unmapping.  i.e. the decrease reservation should fail.  Shim
> successfully offlines 753 pages (nothing to do with the frames the guest
> selected), and fails to offline 1291, and despite this, returns success.
> 
> First of all, the "failed to reserve" is in pv_shim_offline_memory()
> which is a void function that has a semantically relevant failure case. 
> This obviously isn't ok.

So on failure to reserve the pages for offlining we should likely add
them again to the domU and return the number of pages that have been
fully offlined?

Not sure if that's doable, but I think by poking at the extends list
Xen should be able to repopulate the entries.

> 
> Second, the way the compat code loops over the translated data is
> incompatible with how args.nr_done is used for the call into
> pv_shim_offline_memory().

Oh, I would have to check that, I tend to get lost in compat code. The
code in pv_shim_offline_memory assumes that args.nr_done will contain
the total amount of successfully ballooned out pages.

> Why is pv_shim_offline_memory() not in decrease_reservation() to begin with?

I guess to try to batch the decrease into a single call to
batch_memory_op, and to keep the symmetry with the call to
pv_shim_online_memory.

But most of this was done in a hurry, so it's likely it's just there
because that's the first place that seemed sensible enough.

> Furthermore, there is a fundamental difference in ballooning behaviour
> between PV and HVM guests, which I don't think we can compensate for. 
> PV guests need to call decrease reservation once to release the frames,
> and unmap the frames (in any order).  HVM guests calling decrease
> reservation automatically make the frame unusable no matter how many
> outstanding references exist.

Ouch, so you can call XENMEM_decrease_reservation and then unmap the
pages from the guest page-tables and they will be ballooned out?

TBH I had no idea this was possible, I've mostly assumed a model
similar with HVM, where you call decrease_reservation and the pages
are just removed from the physmap.

> Shim can't decrease reservation (HVM with L0 Xen) on any frame who's
> reference count didn't drop to 0 from the PV guests' call, and there is
> nothing presently to check this condition.

But shim will only balloon out free domheap pages (as it gets them
from alloc_domheap_pages), and those shouldn't have any reference by
the guest?

> Short of a PGC bit and extra shim logic in free_domheap_page(), I can't
> see any way to reconcile the behaviour, except to change the semantics
> of decrease reservation for PV guests.  In practice, this would be far
> more sensible behaviour, but we have no idea if existing PV guests would
> manage.

Hm, I guess we could add some hook to free_domheap_page in order to
remove them from the physmap once the guest frees them?

How does Xen know which pages freed by a PV guest should be ballooned
out?

Is that done solely based on the fact that those pages don't have any
reference?

That doesn't seem like a viable option unless we add a new bit to the
page struct in order to signal that those pages should be ballooned
out once freed, as you suggest.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.