[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/svm: retry after unhandled NPT fault if gfn was marked for recalculation



On 22/05/2020 11:05, Igor Druzhinin wrote:
> On 22/05/2020 10:45, Andrew Cooper wrote:
>> On 21/05/2020 22:43, Igor Druzhinin wrote:
>>> If a recalculation NPT fault hasn't been handled explicitly in
>>> hvm_hap_nested_page_fault() then it's potentially safe to retry -
>>> US bit has been re-instated in PTE and any real fault would be correctly
>>> re-raised next time.
>>>
>>> This covers a specific case of migration with vGPU assigned on AMD:
>>> global log-dirty is enabled and causes immediate recalculation NPT
>>> fault in MMIO area upon access. This type of fault isn't described
>>> explicitly in hvm_hap_nested_page_fault (this isn't called on
>>> EPT misconfig exit on Intel) which results in domain crash.
>>>
>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
>>> ---
>>>  xen/arch/x86/hvm/svm/svm.c | 4 ++++
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
>>> index 46a1aac..f0d0bd3 100644
>>> --- a/xen/arch/x86/hvm/svm/svm.c
>>> +++ b/xen/arch/x86/hvm/svm/svm.c
>>> @@ -1726,6 +1726,10 @@ static void svm_do_nested_pgfault(struct vcpu *v,
>>>          /* inject #VMEXIT(NPF) into guest. */
>>>          nestedsvm_vmexit_defer(v, VMEXIT_NPF, pfec, gpa);
>>>          return;
>>> +    case 0:
>>> +        /* If a recalculation page fault hasn't been handled - just retry. 
>>> */
>>> +        if ( pfec & PFEC_user_mode )
>>> +            return;
>> This smells like it is a recipe for livelocks.
>>
>> Everything should have been handled properly by the call to
>> p2m_pt_handle_deferred_changes() which precedes svm_do_nested_pgfault().
>>
>> It is legitimate for the MMIO mapping to end up being transiently
>> recalculated, but the fact that p2m_pt_handle_deferred_changes() doesn't
>> fix it up suggests that the bug is there.
>>
>> Do you have the complete NPT walk to the bad mapping? Do we have
>> _PAGE_USER in the leaf mapping, or is this perhaps a spurious fault?
> It does fix it up. The problem is that currently in SVM we enter
> svm_do_nested_pgfault immediately after p2m_pt_handle_deferred_changes
> is finished finished.

Oh - so we do.  I'd read the entry condition for svm_do_nested_pgfault()
incorrectly.

Jan - why did you chose to do it this way?  If
p2m_pt_handle_deferred_changes() has made a modification, there is
surely nothing relevant to do in svm_do_nested_pgfault().

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.