[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults immediately



On 03/06/2020 12:48, Paul Durrant wrote:
>> -----Original Message-----
>> From: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
>> Sent: 03 June 2020 12:45
>> To: paul@xxxxxxx; 'Jan Beulich' <jbeulich@xxxxxxxx>
>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; andrew.cooper3@xxxxxxxxxx; wl@xxxxxxx; 
>> roger.pau@xxxxxxxxxx;
>> george.dunlap@xxxxxxxxxx
>> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults 
>> immediately
>>
>> On 03/06/2020 12:28, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@xxxxxxxx>
>>>> Sent: 03 June 2020 12:22
>>>> To: paul@xxxxxxx
>>>> Cc: 'Igor Druzhinin' <igor.druzhinin@xxxxxxxxxx>; 
>>>> xen-devel@xxxxxxxxxxxxxxxxxxxx;
>>>> andrew.cooper3@xxxxxxxxxx; wl@xxxxxxx; roger.pau@xxxxxxxxxx; 
>>>> george.dunlap@xxxxxxxxxx
>>>> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults 
>>>> immediately
>>>>
>>>> On 03.06.2020 12:26, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: Jan Beulich <jbeulich@xxxxxxxx>
>>>>>> Sent: 03 June 2020 11:03
>>>>>> To: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
>>>>>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; andrew.cooper3@xxxxxxxxxx; 
>>>>>> wl@xxxxxxx; roger.pau@xxxxxxxxxx;
>>>>>> george.dunlap@xxxxxxxxxx; Paul Durrant <paul@xxxxxxx>
>>>>>> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults 
>>>>>> immediately
>>>>>>
>>>>>> On 02.06.2020 18:56, Igor Druzhinin wrote:
>>>>>>> A recalculation NPT fault doesn't always require additional handling
>>>>>>> in hvm_hap_nested_page_fault(), moreover in general case if there is no
>>>>>>> explicit handling done there - the fault is wrongly considered fatal.
>>>>>>>
>>>>>>> This covers a specific case of migration with vGPU assigned on AMD:
>>>>>>> at a moment log-dirty is enabled globally, recalculation is requested
>>>>>>> for the whole guest memory including directly mapped MMIO regions of 
>>>>>>> vGPU
>>>>>>> which causes a page fault being raised at the first access to those;
>>>>>>> but due to MMIO P2M type not having any explicit handling in
>>>>>>> hvm_hap_nested_page_fault() a domain is erroneously crashed with 
>>>>>>> unhandled
>>>>>>> SVM violation.
>>>>>>>
>>>>>>> Instead of trying to be opportunistic - use safer approach and handle
>>>>>>> P2M recalculation in a separate NPT fault by attempting to retry after
>>>>>>> making the necessary adjustments. This is aligned with Intel behavior
>>>>>>> where there are separate VMEXITs for recalculation and EPT violations
>>>>>>> (faults) and only faults are handled in hvm_hap_nested_page_fault().
>>>>>>> Do it by also unifying do_recalc return code with Intel implementation
>>>>>>> where returning 1 means P2M was actually changed.
>>>>>>>
>>>>>>> Since there was no case previously where 
>>>>>>> p2m_pt_handle_deferred_changes()
>>>>>>> could return a positive value - it's safe to replace ">= 0" with just 
>>>>>>> "== 0"
>>>>>>> in VMEXIT_NPF handler. finish_type_change() is also not affected by the
>>>>>>> change as being able to deal with >0 return value of p2m->recalc from
>>>>>>> EPT implementation.
>>>>>>>
>>>>>>> Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>>>>>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>
>>>>>>
>>>>>> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>>>> albeit preferably with ...
>>>>>>
>>>>>>> @@ -448,12 +451,14 @@ static int do_recalc(struct p2m_domain *p2m, 
>>>>>>> unsigned long gfn)
>>>>>>>              clear_recalc(l1, e);
>>>>>>>          err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
>>>>>>>          ASSERT(!err);
>>>>>>> +
>>>>>>> +        recalc_done = true;
>>>>>>>      }
>>>>>>>
>>>>>>>   out:
>>>>>>>      unmap_domain_page(table);
>>>>>>>
>>>>>>> -    return err;
>>>>>>> +    return err ?: (recalc_done ? 1 : 0);
>>>>>>
>>>>>> ... this shrunk to
>>>>>>
>>>>>>     return err ?: recalc_done;
>>>>>>
>>>>>> (easily doable while committing).
>>>>>>
>>>>>> Also Cc Paul.
>>>>>>
>>>>>
>>>>> paging_log_dirty_enable() still fails global enable if has_arch_pdevs()
>>>>> is true, so presumably there's no desperate need for this to go in 4.14?
>>>>
>>>> The MMIO case is just the particular situation here. Aiui the same issue
>>>> could potentially surface with other p2m types. Also given I'd consider
>>>> this a backporting candidate, while it may not be desperately needed for
>>>> the release, I think it deserves considering beyond the specific aspect
>>>> you mention.
>>>>
>>>
>>> In which case I think the commit message probably ought to be rephrased, 
>>> since it appears to focus
>> on a case that cannot currently happen.
>>
>> This can happen without has_arch_pdevs() being true. It's enough to just
>> directly map some physical memory into a guest to get p2m_direct_mmio
>> type present in the page tables.
> 
> OK, that's fine, but when will that happen without pass-through? If we can 
> have a commit message justifying the change based on what can actually happen 
> now, then I would not be opposed to it going in 4.14.

Yes, it can happen now - we had regular (not SR-IOV) vGPU migration totally
broken because of this on AMD - never got tested before at all. You don't need
special patches on top of Xen or anything.

To get p2m_mmio_direct you just need to call XEN_DOMCTL_memory_mapping on a 
domain.

All 

Igor



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.