[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/HVM: don't mark evtchn upcall vector as pending when vLAPIC is disabled


  • To: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 21 Nov 2022 13:34:38 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vgMQQ1uffSE9xiLv/nTQb1rO9grcF4e+l3+hkZFc1fc=; b=ZmdYskC7uLzEnxTKnPE41A8QQahq7z0RlJ8CCOJeOBXHwjQ6Ylz1ChlXk20bkKZEjPtXv2z0GkKGVeOyayrS/qC8pI9re3xlqsLr5QIRSZW42RtN3PVcwIXgUoF/gMio/pbR8pdkMOohD60U4Qe4GXkW3XtXoKC4oZdmhfNyUUzEqn7hYoZj4PseUxP9t21jTejAQatbjsrvZl2MpNQUVR5Mi0KPYqQRn5/4EAY7uO/rONSejQEbyhZRQI9hiXmtjyNfkUidUdlLEKcwjrixhOcXdgumlucaVqqn5pbEvPZzwdTmVt6noDVhGxIkROuuHznMIg6HcrbuseLfuY52EA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WSKnNg0NFbBpQIgP59srSBup+GsP0OvnSctiNK1/jR4pqMS7d0epTQx9J8EbstVJMxNt5Me8P5K3Gl5Bbk03VQOR2oq9OoW7k7jneFdflWJijUSzgsg41rNgVKBLt8IwtxjTa2jlbyogLYAvc9Bg9fnOVGefbAqr9r0744ShLHzMCPxnpkoYpkfff2URp+YRMGdLmN0vv3iISDOU0KLbi+iSsrJ7aYjpjjMDfghZgFu0R35YyXelR7z6/SSu7NVbRgDbYZNDxBhhFo7pd4eAB519r8L125SqtNl/Tw14MhjCeJP9ppUucXmFAsIR7Rg2yglxYAVnlV8V419IPI1rBw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Wei Liu <wl@xxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 21 Nov 2022 12:34:51 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 21.11.2022 13:23, Andrew Cooper wrote:
> On 21/11/2022 08:56, Jan Beulich wrote:
>> On 18.11.2022 15:27, Andrew Cooper wrote:
>>> On 18/11/2022 12:54, Jan Beulich wrote:
>>>> On 18.11.2022 13:33, Andrew Cooper wrote:
>>>>> On 18/11/2022 10:31, Jan Beulich wrote:
>>>>>> Linux'es relatively new use of HVMOP_set_evtchn_upcall_vector has
>>>>>> exposed a problem with the marking of the respective vector as
>>>>>> pending: For quite some time Linux has been checking whether any stale
>>>>>> ISR or IRR bits would still be set while preparing the LAPIC for use.
>>>>>> This check is now triggering on the upcall vector, as the registration,
>>>>>> at least for APs, happens before the LAPIC is actually enabled.
>>>>>>
>>>>>> In software-disabled state an LAPIC would not accept any interrupt
>>>>>> requests and hence no IRR bit would newly become set while in this
>>>>>> state. As a result it is also wrong for us to mark the upcall vector as
>>>>>> having a pending request when the vLAPIC is in this state.
>>>>> I agree with this.
>>>>>
>>>>>> To compensate for the "enabled" check added to the assertion logic, add
>>>>>> logic to (conditionally) mark the upcall vector as having a request
>>>>>> pending at the time the LAPIC is being software-enabled by the guest.
>>>>> But this, I don't think is appropriate.
>>>>>
>>>>> The point of raising on enable is allegedly to work around setup race
>>>>> conditions.  I'm unconvinced by this reasoning, but it is what it is,
>>>>> and the stated behaviour is to raise there and then.
>>>>>
>>>>> If a guest enables evtchn while the LAPIC is disabled, then the
>>>>> interrupt is lost.  Like every other interrupt in an x86 system.
>>>> Edge triggered ones you mean, I suppose, but yes.
>>> For IO-APIC systems, you mostly lose line interrupts too, don't you?
>>>
>>> The line will remain pending at the IO-APIC, but nothing in the system
>>> will unwedge until someone polls the IO-APIC.
>>>
>>> Either way...
>>>
>>>>> I don't think there is any credible way a guest kernel author can expect
>>>>> the weird evtchn edgecase to wait for an arbitrary point in the future,
>>>>> and it's a corner case that I think is worth not keeping.
>>>> Well - did you look at 7b5b8ca7dffd ("x86/upcall: inject a spurious event
>>>> after setting upcall vector"), referenced by the Fixes: tag? The issue is
>>>> that with evtchn_upcall_pending once set, there would never again be a
>>>> notification.
>>> Ok, so we do need to do something.
>>>
>>>>  So if what you say is to be the model we follow, then that
>>>> earlier change was perhaps wrong as well. Instead it should then have
>>>> been a guest change (as also implicit from your reply) to clear
>>>> evtchn_upcall_pending after vCPU info registration (there) or LAPIC
>>>> enabling (here), perhaps by way of "manually" invoking the handling of
>>>> that pending event, or by issuing a self-IPI with that vector.
>>>> Especially the LAPIC enabling case would then be yet another Xen-specific
>>>> on a guest code path which better wouldn't have to be aware of Xen. 
>>> Without trying to prescribe how to fix this specific issue, wherever
>>> possible we should be trying to limit the Xen-isms from non-Xen areas. 
>>> There's a whole lot of poorly described and surprising behaviours which
>>> have not stood the test of time.
>>>
>>> In this case, it seems that we have yet another x86 PV-ism which hasn't
>>> translated well x86 HVM.  Specifically, we're trying to overlay an
>>> entirely shared-memory (and delayed return-to-guest) interrupt
>>> controller onto one which is properly constructed to handle events in
>>> realtime.
>>>
>>>
>>> I even got as far as writing that maybe leaving it as-is was the best
>>> option (principle of least surprise for Xen developers), but our
>>> "friend" apic acceleration strikes again.
>>>
>>> Xen doesn't always get a VMExit when the guest clears SW_DISABLE,
>>> because microcode may have accelerated it.
>> But as per "APIC-Write Emulation" in the SDM we'd still get an APIC-write
>> VM exit.
> 
> Intel isn't the only accelerated implementation, and there future
> details not in the public docs.
> 
> There will be an implementation we will want to support where Xen
> doesn't get a vmexit for a write to SPIV.

I see.

>> If we didn't, how would our internal accounting of APIC enabled
>> state (VLAPIC_SW_DISABLED) work?
> 
> It doesn't.
> 
> One of many problems on the "known errors" list from an incomplete
> original attempt to get acceleration working.
> 
> There's no good reason to cache those disables in the first place (both
> are both trivially available from other positions in memory), and
> correctness reasons not to.
> 
>>  And the neighboring (to where I'm adding
>> the new code) pt_may_unmask_irq() call then also wouldn't occur.
>>
>> I'm actually pretty sure we do too much in this case - in particular none
>> of the vlapic_set_reg() should be necessary. But we certainly can't get
>> away with doing nothing, and hence we depend on that VM exit to actually
>> occur.
> 
> We must do exactly and only what real hardware does, so that the state
> changes emulated by Xen are identical to those accelerated by microcode.
> 
> Other than that, I really wouldn't make any presumptions about the
> existing vLAPIC logic being correct.
> 
> It is, at best, enough of an approximation to the spec for major OSes to
> function, with multiple known bugs and no coherent testing.

But can we leave resolving of the wider issue then separate, and leave
the change here as it presently is? Yes, mimic-ing the same behavior
later may be "interesting", but if we can't achieve consistent behavior
with yet more advanced acceleration, maybe we simply can't use that
(perhaps until a complete overhaul of everything involved in LAPIC
handling, possibly including a guest side indicator that they're happy
without the extra signaling, at which point that yet-more-advanced
acceleration could then be enabled for that guest).

Otherwise - do you have any suggestion as to alternative logic which I
might use in this patch?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.