[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: HVM/PVH Balloon crash
On 07.09.2021 17:03, Elliott Mitchell wrote: > On Tue, Sep 07, 2021 at 10:03:51AM +0200, Jan Beulich wrote: >> On 06.09.2021 22:47, Elliott Mitchell wrote: >>> On Mon, Sep 06, 2021 at 09:52:17AM +0200, Jan Beulich wrote: >>>> On 06.09.2021 00:10, Elliott Mitchell wrote: >>>>> I brought this up a while back, but it still appears to be present and >>>>> the latest observations appear rather serious. >>>>> >>>>> I'm unsure of the entire set of conditions for reproduction. >>>>> >>>>> Domain 0 on this machine is PV (I think the BIOS enables the IOMMU, but >>>>> this is an older AMD IOMMU). >>>>> >>>>> This has been confirmed with Xen 4.11 and Xen 4.14. This includes >>>>> Debian's patches, but those are mostly backports or environment >>>>> adjustments. >>>>> >>>>> Domain 0 is presently using a 4.19 kernel. >>>>> >>>>> The trigger is creating a HVM or PVH domain where memory does not equal >>>>> maxmem. >>>> >>>> I take it you refer to "[PATCH] x86/pod: Do not fragment PoD memory >>>> allocations" submitted very early this year? There you said the issue >>>> was with a guest's maxmem exceeding host memory size. Here you seem to >>>> be talking of PoD in its normal form of use. Personally I uses this >>>> all the time (unless enabling PCI pass-through for a guest, for being >>>> incompatible). I've not observed any badness as severe as you've >>>> described. >>> >>> I've got very little idea what is occurring as I'm expecting to be doing >>> ARM debugging, not x86 debugging. >>> >>> I was starting to wonder whether this was widespread or not. As such I >>> was reporting the factors which might be different in my environment. >>> >>> The one which sticks out is the computer has an older AMD processor (you >>> a 100% Intel shop?). >> >> No, AMD is as relevant to us as is Intel. >> >>> The processor has the AMD NPT feature, but a very >>> early/limited IOMMU (according to Linux "AMD IOMMUv2 functionality not >>> available"). >>> >>> Xen 4.14 refused to load the Domain 0 kernel as PVH (not enough of an >>> IOMMU). >> >> That sounds odd at the first glance - PVH simply requires that there be >> an (enabled) IOMMU. Hence the only thing I could imagine is that Xen >> doesn't enable the IOMMU in the first place for some reason. > > Doesn't seem that odd to me. I don't know the differences between the > first and second versions of the AMD IOMMU, but could well be v1 was > judged not to have enough functionality to bother with. > > What this does make me wonder is, how much testing was done on systems > with functioning NPT, but disabled IOMMU? No idea. During development is may happen (rarely) that one disables the IOMMU on purpose. Beyond that - can't tell. > Could be this system is in an > intergenerational hole, and some spot in the PVH/HVM code makes an > assumption of the presence of NPT guarantees presence of an operational > IOMMU. Otherwise if there was some copy and paste while writing IOMMU > code, some portion of the IOMMU code might be checking for presence of > NPT instead of presence of IOMMU. This is all very speculative; I consider what you suspect not very likely, but also not entirely impossible. This is not the least because for a long time we've been running without shared page tables on AMD. I'm afraid without technical data and without knowing how to repro, I don't see a way forward here. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |