[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done
On 29.10.21 11:57, Marek Marczykowski-Górecki wrote: On Fri, Oct 29, 2021 at 06:48:44AM +0200, Juergen Gross wrote:On 28.10.21 22:16, Marek Marczykowski-Górecki wrote:On Thu, Oct 28, 2021 at 12:59:52PM +0200, Juergen Gross wrote:When running as PVH or HVM guest with actual memory < max memory the hypervisor is using "populate on demand" in order to allow the guest to balloon down from its maximum memory size. For this to work correctly the guest must not touch more memory pages than its target memory size as otherwise the PoD cache will be exhausted and the guest is crashed as a result of that. In extreme cases ballooning down might not be finished today before the init process is started, which can consume lots of memory. In order to avoid random boot crashes in such cases, add a late init call to wait for ballooning down having finished for PVH/HVM guests. Cc: <stable@xxxxxxxxxxxxxxx> Reported-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> Signed-off-by: Juergen Gross <jgross@xxxxxxxx>It may happen that initial balloon down fails (state==BP_ECANCELED). In that case, it waits indefinitely. I think it should rather report a failure (and panic? it's similar to OOM before PID 1 starts, so rather hard to recover), instead of hanging.Okay, I can add something like that. I'm thinking of issuing a failure message in case of credit not having changed for 1 minute and panic() after two more minutes. Is this fine?Isn't it better to get a state from balloon_thread()? If the balloon fails it won't really try anymore (until 3600s timeout), so waiting in that state doesn't help. And reporting the failure earlier may be more user friendly. Or maybe there is something that could wakeup the thread earlier, that I don't see? Hot plugging more RAM is rather unlikely at this stage... Waking up the thread would be easy, but probably that wouldn't really help. The idea was that maybe a Xen admin would see the guest not booting up further and then adding some more memory to the guest (this should wake up the balloon thread again). I agree that stopping to wait for ballooning to finish in case of it having failed is probably a sensible thing to do. Additionally I could add a boot parameter to control the timeout after the fail message and the panic(). What do you think? Juergen Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |