[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/pod: Do not fragment PoD memory allocations

On Tue, Jan 26, 2021 at 12:08:15PM +0100, Jan Beulich wrote:
> On 25.01.2021 18:46, Elliott Mitchell wrote:
> > On Mon, Jan 25, 2021 at 10:56:25AM +0100, Jan Beulich wrote:
> >> On 24.01.2021 05:47, Elliott Mitchell wrote:
> >>>
> >>> ---
> >>> Changes in v2:
> >>> - Include the obvious removal of the goto target.  Always realize you're
> >>>   at the wrong place when you press "send".
> >>
> >> Please could you also label the submission then accordingly? I
> >> got puzzled by two identically titled messages side by side,
> >> until I noticed the difference.
> > 
> > Sorry about that.  Would you have preferred a third message mentioning
> > this mistake?
> No. But I'd have expected v2 to have its subject start with
> "[PATCH v2] ...", making it relatively clear that one might
> save looking at the one labeled just "[PATCH] ...".

Yes, in fact I spotted this just after.  I was in a situation of, "does
this deserve sending an additional message out?"  (ugh, yet more damage
from that issue...)

> >>> I'm not including a separate cover message since this is a single hunk.
> >>> This really needs some checking in `xl`.  If one has a domain which
> >>> sometimes gets started on different hosts and is sometimes modified with
> >>> slightly differing settings, one can run into trouble.
> >>>
> >>> In this case most of the time the particular domain is most often used
> >>> PV/PVH, but every so often is used as a template for HVM.  Starting it
> >>> HVM will trigger PoD mode.  If it is started on a machine with less
> >>> memory than others, PoD may well exhaust all memory and then trigger a
> >>> panic.
> >>>
> >>> `xl` should likely fail HVM domain creation when the maximum memory
> >>> exceeds available memory (never mind total memory).
> >>
> >> I don't think so, no - it's the purpose of PoD to allow starting
> >> a guest despite there not being enough memory available to
> >> satisfy its "max", as such guests are expected to balloon down
> >> immediately, rather than triggering an oom condition.
> > 
> > Even Qemu/OVMF is expected to handle ballooning for a *HVM* domain?
> No idea how qemu comes into play here. Any preboot environment
> aware of possibly running under Xen of course is expected to
> tolerate running with maxmem > memory (or clearly document its
> inability, in which case it may not be suitable for certain
> use cases). For example, I don't see why a preboot environment
> would need to touch all of the memory in a VM, except maybe
> for the purpose of zeroing it (which PoD can deal with fine).

I'm reading that as your answer to the above question is "yes".

> >>> For example try a domain with the following settings:
> >>>
> >>> memory = 8192
> >>> maxmem = 2147483648
> >>>
> >>> If type is PV or PVH, it will likely boot successfully.  Change type to
> >>> HVM and unless your hardware budget is impressive, Xen will soon panic.
> >>
> >> Xen will panic? That would need fixing if so. Also I'd consider
> >> an excessively high maxmem (compared to memory) a configuration
> >> error. According to my experiments long, long ago I seem to
> >> recall that a factor beyond 32 is almost never going to lead to
> >> anything good, irrespective of guest type. (But as said, badness
> >> here should be restricted to the guest; Xen itself should limp
> >> on fine.)
> > 
> > I'll confess I haven't confirmed the panic is in Xen itself.  Problem is
> > when this gets triggered, by the time the situation is clear and I can
> > get to the console the computer is already restarting, thus no error
> > message has been observed.
> If the panic isn't in Xen itself, why would the computer be
> restarting?

I was thinking there was a possibility it is actually Domain 0 which is

> > This is most certainly a configuration error.  Problem is this is a very
> > small delta between a perfectly valid configuration and the one which
> > reliably triggers a panic.
> > 
> > The memory:maxmem ratio isn't the problem.  My example had a maxmem of
> > 2147483648 since that is enough to exceed the memory of sub-$100K
> > computers.  The crucial features are maxmem >= machine memory,
> > memory < free memory (thus potentially bootable PV/PVH) and type = "hvm".
> > 
> > When was the last time you tried running a Xen machine with near zero
> > free memory?  Perhaps in the past Xen kept the promise of never panicing
> > on memory exhaustion, but this feels like this hasn't held for some time.
> Such bugs needs fixing. Which first of all requires properly
> pointing them out. A PoD guest misbehaving when there's not
> enough memory to fill its pages (i.e. before it manages to
> balloon down) is expected behavior. If you can't guarantee the
> guest ballooning down quickly enough, don't configure it to
> use PoD. A PoD guest causing a Xen crash, otoh, is very likely
> even a security issue. Which needs to be treated as such: It
> needs fixing, not avoiding by "curing" one of perhaps many
> possible sources.

Okay, this has been reliably reproducing for a while.  I had originally
thought it was a problem of HVM plus memory != maxmem, but the
non-immediate restart disagrees with that assessment.

> As an aside - if the PoD code had proper 1Gb page support,
> would you then propose to only allocate in 1Gb chunks? And if
> there was a 512Gb page feature in hardware, in 512Gb chunks
> (leaving aside the fact that scanning 512Gb of memory to be
> all zero would simply take too long with today's computers)?

That answer would vary over time.  Today or tommorrow, certainly not.
In a decade's time (or several) when a typical pocket computer^W^W
cellphone has 4TB of memory and a $30K server has a minimum of 128TB then
doing allocations in 1GB chunks would be worthy of consideration.

(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         ehem+sigmsg@xxxxxxx  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.