[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] XenGT is still regressed on master



On 08/03/2019 11:55, Jan Beulich wrote:
> If so, my first reaction is to blame the kernel module:
> Machine state (of the VM) may not change while processing
> a write, other than to carry out the _direct_ effects of the write. I
> don't think a p2m type change is supposed to be occurring as a side
> effect.

I disagree. Anything may change during processing of IOREQ and it's
hypervisor that shouldn't make any assumptions on the machine state
before and after the request got sent.

>> The bug could be mitigated by the following patch but since it's you who
>> introduced this helper you might have better ideas how to avoid the
>> problem in a clean way here.
>>
>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -1139,13 +1139,11 @@ static int linear_write(unsigned long addr,
>> unsigned int bytes, void *p_data,
>>      {
>>          unsigned int offset, part1;
>>
>> -    case HVMTRANS_okay:
>> -        return X86EMUL_OKAY;
>> -
>>      case HVMTRANS_bad_linear_to_gfn:
>>          x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt);
>>          return X86EMUL_EXCEPTION;
>>
>> +    case HVMTRANS_okay:
>>      case HVMTRANS_bad_gfn_to_mfn:
>>          offset = addr & ~PAGE_MASK;
>>          if ( offset + bytes <= PAGE_SIZE )
> 
> This is (I'm inclined to say "of course") not an appropriate change in
> the general case: Getting back HVMTRANS_okay means the write
> was carried out, and hence it shouldn't be tried to be carried out a
> 2nd time.
> 
> I take it that changing the kernel driver would at best be sub-optimal
> though, so a hypervisor-only fix would be better.

Yes, changing the kernel module is very undesirable for many reasons.

> Assuming the p2m type change arrives via XEN_DMOP_set_mem_type,
> I think what we need to do is delay the actual change until no ioreq
> is pending anymore, kind of like the VM event subsystem delays
> certain CR and MSR writes until VM entry time. In this situation we'd
> then further have to restrict the number of such pending changes,
> because we need to record the request(s) somewhere. Question is
> how many of such bufferable requests would be enough: Unless we
> wanted to enforce that only ordinary aligned writes (and rmw-s) can
> get us into this state, apart from writes crossing page boundaries we
> may need to be able to cope with AVX512's scatter insns.
> 
> The other alternative I can think of would be to record the specific
> failure case (finding p2m_ioreq_server) in the first pass, and act upon
> it (instead of your mitigation above) during the retry. At the first
> glance it would seem more cumbersome to do the recording at that
> layer though.

I like the latter suggestion more. Seems less invasive and prone to
regressions. I'd like to try to implement it. Although I think the
hypervisor check should be more general: like if IOREQ is in progress
don't try to got through fast-path and re-enter IOREQ completion path.

What if we just check !hvm_ioreq_needs_completion() before returning
X86EMUL_OKAY i.e. fall through to the bad_gfn_to_mfn case if that check
fails as Paul suggested?

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.