[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Altp2m use with PML can deadlock Xen
On 10/05/2019 15:53, Razvan Cojocaru wrote: > On 5/10/19 5:42 PM, Tamas K Lengyel wrote: >> On Thu, May 9, 2019 at 10:19 AM Andrew Cooper >> <andrew.cooper3@xxxxxxxxxx> wrote: >>> >>> On 09/05/2019 14:38, Tamas K Lengyel wrote: >>>> Hi all, >>>> I'm investigating an issue with altp2m that can easily be reproduced >>>> and leads to a hypervisor deadlock when PML is available in hardware. >>>> I haven't been able to trace down where the actual deadlock occurs. >>>> >>>> The problem seem to stem from hvm/vmx/vmcs.c:vmx_vcpu_flush_pml_buffer >>>> that calls p2m_change_type_one on all gfns that were recorded the PML >>>> buffer. The problem occurs when the PML buffer full vmexit happens >>>> while the active p2m is an altp2m. Switching p2m_change_type_one to >>>> work with the altp2m instead of the hostp2m however results in EPT >>>> misconfiguration crashes. >>>> >>>> Adding to the issue is that it seem to only occur when the altp2m has >>>> remapped GFNs. Since PML records entries based on GFN leads me to >>>> question whether it is safe at all to use PML when altp2m is used with >>>> GFN remapping. However, AFAICT the GFNs in the PML buffer are not the >>>> remapped GFNs and my understanding is that it should be safe as long >>>> as the GFNs being tracked by PML are never the remapped GFNs. >>>> >>>> Booting Xen with ept=pml=0 resolves the issue. >>>> >>>> If anyone has any insight into what might be happening, please let >>>> me know. >>> >>> >>> I could have sworn that George spotted a problem here and fixed it. I >>> shouldn't be surprised if we have more. >>> >>> The problem that PML introduced (and this is mostly my fault, as I >>> suggested the buggy solution) is that the vmexit handler from one vcpu >>> pauses others to drain the PML queue into the dirty bitmap. Overall I >>> wasn't happy with the design and I've got some ideas to improve it, but >>> within the scope of how altp2m was engineered, I proposed >>> domain_pause_except_self(). >>> >>> As it turns out, that is vulnerable to deadlocks when you get two vcpus >>> trying to pause each other and waiting for each other to become >>> de-scheduled. >> >> Makes sense. >> >>> >>> I see this has been reused by the altp2m code, but it *should* be safe >>> to deadlocks now that it takes the hypercall_deadlock_mutext. >> >> Is that already in staging or your x86-next branch? I would like to >> verify that the problem is still present or not with that change. I >> tested with Xen 4.12 release and that definitely still deadlocks. > > I don't know if Andrew is talking about this patch (probably not, but > it looks at least related): > > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=24d5282527f4647907b3572820b5335c15cd0356;hp=29d28b29190ba09d53ae7e475108def84e16e363 > I was referring to 29d28b2919 which is also in 4.12 as it turns out. That said, 24d5282527 might in practice be the cause of the deadlock, so I'd first experiment with taking that fix out. I know for certain that it won't be tested with PML enabled, because the use of PML is incompatible with write-protecting guest pagetables. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |