[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/HVM: don't mark evtchn upcall vector as pending when vLAPIC is disabled


  • To: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 21 Nov 2022 09:56:27 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6SUCM6cVplr9ffzTzczI0ukj1ZqakVMnRlVhKFc90rY=; b=iqi4IyuRrW/PhOu+asOZD6NYv2jH6RF/3ohrrX9u7ifEXqYWDg2me1exyDDgsJT+WvGHicplP8+bRXE23njIHfzSeTN2MeaqBDDNvoyaGItatPKCmc83kTGkB5zakXiPm2oAhxVlCpffeyK2A0XjGL8nxZwb2Owx4ZPwAECjU3udiS4phBwt+wrXnWP0DSW+EL7rGVzGEvf7eIsGDdsMLDVlYi+bHFHbNur/wAbMpfZMZlYgpy5d2Rrno0gAPhPzsGVVJng4Wusf3j+NwBOkPAyYwjvuybybWEmkuaC+7jfY14kwv4dJAQR928KxG9rj4XQweyXKzCthocvY9MZudA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dAYtz5GmFPeJX8NfmmqNjVbI1bSAZiQqPpvwnx4E2C6VXtMALcjce2M0lhg1U3kqS9hqufXe/4jLXOFl/zOZyBhRNvbiuQSJc+chArgGUdgIc7pOGJ9VVa1X3aOnyEK7rD20RePyfTj9icJVnLVuQRuUccf/FJQYywWZgIQq6OTiKg6wsuyALHCEUxe4NJmFYeHjtt73ekD7d086CdQMMnCtYjWFUbftaxP6gQ0n5pk7o/83PTIxvuVEj/rwPQUAxzkxMV79r4lUDTrNKKTRRE+qSoI8S0byaXMvtuG/uRfUSTSjBE2XB6VRxzqA8KeTCnDtHep4/MXY+EFUOzgExw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Wei Liu <wl@xxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 21 Nov 2022 08:56:38 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 18.11.2022 15:27, Andrew Cooper wrote:
> On 18/11/2022 12:54, Jan Beulich wrote:
>> On 18.11.2022 13:33, Andrew Cooper wrote:
>>> On 18/11/2022 10:31, Jan Beulich wrote:
>>>> Linux'es relatively new use of HVMOP_set_evtchn_upcall_vector has
>>>> exposed a problem with the marking of the respective vector as
>>>> pending: For quite some time Linux has been checking whether any stale
>>>> ISR or IRR bits would still be set while preparing the LAPIC for use.
>>>> This check is now triggering on the upcall vector, as the registration,
>>>> at least for APs, happens before the LAPIC is actually enabled.
>>>>
>>>> In software-disabled state an LAPIC would not accept any interrupt
>>>> requests and hence no IRR bit would newly become set while in this
>>>> state. As a result it is also wrong for us to mark the upcall vector as
>>>> having a pending request when the vLAPIC is in this state.
>>> I agree with this.
>>>
>>>> To compensate for the "enabled" check added to the assertion logic, add
>>>> logic to (conditionally) mark the upcall vector as having a request
>>>> pending at the time the LAPIC is being software-enabled by the guest.
>>> But this, I don't think is appropriate.
>>>
>>> The point of raising on enable is allegedly to work around setup race
>>> conditions.  I'm unconvinced by this reasoning, but it is what it is,
>>> and the stated behaviour is to raise there and then.
>>>
>>> If a guest enables evtchn while the LAPIC is disabled, then the
>>> interrupt is lost.  Like every other interrupt in an x86 system.
>> Edge triggered ones you mean, I suppose, but yes.
> 
> For IO-APIC systems, you mostly lose line interrupts too, don't you?
> 
> The line will remain pending at the IO-APIC, but nothing in the system
> will unwedge until someone polls the IO-APIC.
> 
> Either way...
> 
>>
>>> I don't think there is any credible way a guest kernel author can expect
>>> the weird evtchn edgecase to wait for an arbitrary point in the future,
>>> and it's a corner case that I think is worth not keeping.
>> Well - did you look at 7b5b8ca7dffd ("x86/upcall: inject a spurious event
>> after setting upcall vector"), referenced by the Fixes: tag? The issue is
>> that with evtchn_upcall_pending once set, there would never again be a
>> notification.
> 
> Ok, so we do need to do something.
> 
>>  So if what you say is to be the model we follow, then that
>> earlier change was perhaps wrong as well. Instead it should then have
>> been a guest change (as also implicit from your reply) to clear
>> evtchn_upcall_pending after vCPU info registration (there) or LAPIC
>> enabling (here), perhaps by way of "manually" invoking the handling of
>> that pending event, or by issuing a self-IPI with that vector.
>> Especially the LAPIC enabling case would then be yet another Xen-specific
>> on a guest code path which better wouldn't have to be aware of Xen. 
> 
> Without trying to prescribe how to fix this specific issue, wherever
> possible we should be trying to limit the Xen-isms from non-Xen areas. 
> There's a whole lot of poorly described and surprising behaviours which
> have not stood the test of time.
> 
> In this case, it seems that we have yet another x86 PV-ism which hasn't
> translated well x86 HVM.  Specifically, we're trying to overlay an
> entirely shared-memory (and delayed return-to-guest) interrupt
> controller onto one which is properly constructed to handle events in
> realtime.
> 
> 
> I even got as far as writing that maybe leaving it as-is was the best
> option (principle of least surprise for Xen developers), but our
> "friend" apic acceleration strikes again.
> 
> Xen doesn't always get a VMExit when the guest clears SW_DISABLE,
> because microcode may have accelerated it.

But as per "APIC-Write Emulation" in the SDM we'd still get an APIC-write
VM exit. If we didn't, how would our internal accounting of APIC enabled
state (VLAPIC_SW_DISABLED) work? And the neighboring (to where I'm adding
the new code) pt_may_unmask_irq() call then also wouldn't occur.

I'm actually pretty sure we do too much in this case - in particular none
of the vlapic_set_reg() should be necessary. But we certainly can't get
away with doing nothing, and hence we depend on that VM exit to actually
occur. Plus simply making the vlapic_set_reg() conditional also likely
wouldn't do any good, so if anything we may want to split
vlapic_reg_write() and invoke only the "2nd half" from
vlapic_apicv_write().

Jan

> A consequence of this observation is that Xen cannot have
> non-LAPIC-archtiectural behaviour in the vlapic emulation.  So I think
> we need to find a solution to this problem that doesn't hook APIC_SPIV.
> 
> ~Andrew




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.