[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/HVM: don't mark evtchn upcall vector as pending when vLAPIC is disabled


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Date: Fri, 18 Nov 2022 14:27:09 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+qPcSZPRcCEGhfaQTMUwLxLzGhIEdK+SWBMxKB4hHc0=; b=Q1Q2He3+Rk9jKUhfHvOzrN8iE6EkhS5YYoD1pZ3/ZY4FYctJSB4SqYnUdSGKrBTNQBPABuP7aVyuq1zYyNdVeFeGbdDxxgPnm8Av1TraMm4UL98NIMimEue0F9U26SKDUZSl5EBg/12OxaqKzgLVQ1vtFxTbTWbH8DoOqRQIbEk1UlGLiaNUKISFCmjJpaHpMsUOwRAlWupiTf50rfentnhrihxdyV6m8FttX8o7Q01cu1r1jb7/WVtrBXt5gsplzv7K9ko/VqeXu3XQjFBtandxfZTgmzqqOlXCpWJE9Ua5/XU6fvCzrBSlenVvOGxNZwQW2cQLLj7rBgKKbDUC5Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Ek7EuKT8Pv5zJsEzXaFCELMLc0bjXbP+4knfDCODgongd6rNi/ADuHJLEQSzL5asNwWGf1V0AqmPhhNBA783ei3bqTDzGD+K7XxEN90UAeKrFyQSb5Yf+4q2EHbiFoyOhbzM0MI6tjphW7qLKNWy/wskggLJz7MPDUzfVkOI+MJp6beZnzaT1ixKJRSkNS/u/dTJoEkY/K2uXpl77hwtZavhH14QJrbwZnhQZuEBFuOXVHAIIhZeT/rFnYiCZVJ/sJaDObcltIBf0ysFxY6deCTL8rsBofRoS1VsVlXDMM4ObF4dvSAXC+FDtQO2NEjTxu/jk59ID3E8Mek1kROYSw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Wei Liu <wl@xxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 18 Nov 2022 14:27:21 +0000
  • Ironport-data: A9a23:a8C5zqP5XzYWOu3vrR2VlsFynXyQoLVcMsEvi/4bfWQNrUoh32cHn zccDTrTafzfZWajct9waouxoxsHuJPdndBnTgto+SlhQUwRpJueD7x1DKtS0wC6dZSfER09v 63yTvGacajYm1eF/k/F3oDJ9CU6jufQA+KmU4YoAwgpLSd8UiAtlBl/rOAwh49skLCRDhiE/ Nj/uKUzAnf8s9JPGj9SuvzrRC9H5qyo4mpB5wxmP5ingXeF/5UrJMNHTU2OByOQrrl8RoaSW +vFxbelyWLVlz9F5gSNy+uTnuUiG9Y+DCDW4pZkc/HKbitq/0Te5p0TJvsEAXq7vh3S9zxHJ HehgrTrIeshFvWkdO3wyHC0GQkmVUFN0OevzXRSLaV/ZqAJGpfh66wGMa04AWEX0uBtMDthq uYBEwI2VBuSu7+H5pK0Y/Y506zPLOGzVG8ekldJ6GiBSNoDH9XESaiM4sJE1jAtgMwIBezZe 8cSdTtoalLHfgFLPVAUTpk5mY9EhFGmK2Ee9A3T+PpxujCIpOBy+OGF3N79U9qGX8hK2G2fo XrL5T/RCRAGLt2PjzGC9xpAg8efwH6nANhPSdVU8NZgqmyzmWlOOCRJbgah+deemHbkWI5Qf hl8Fi0G6PJaGFaQZtv3UgC8oXWElgUBQNcWGOo/gCmSzoLE7gDfAXILJhZZYcEinN87Q3otz FDht8PkA3ljvaOYTVqZ96yItnWiNC4NN2gAaCQYCwwf7LHeTJobixvOSpNvFfCzh9isQTXom WnS9245mqkZitMN2+Oj51fbjjmwp5/PCAko+gHQWWHj5QR8DGK4W7GVBZHgxa4oBO6kopOp5 hDoR+D2ADgyMKyw
  • Ironport-hdrordr: A9a23:+BtNl6kk4E+OQWb4hgjaXFp9W9rpDfNGiWdD5ihNYBxZY6Wkfp +V8cjzhCWftN9OYhodcLC7V5Voj0mskKKdxbNhRYtKPTOWwVdASbsP0WKM+V3d8kHFh41gPO JbAtND4b7LfCRHZKTBkW6F+r8bqbHokZxAx92uqUuFJTsaFp2IhD0JbjpzfHcGJjWvUvECZe ChD4d81k2dUEVSSv7+KmgOXuDFqdGOvJX6YSQeDxpixBiSgSiu4LvaFQHd+hsFSTtAzZor7G CAymXCl+6emsD+7iWZ+37Y7pxQltek4txfBPaUgsxQBiTwhh2ubIFBXaTHmDwuuumg5Hsjjd GJiRY9OMZY7W/XYwiO0FPQ8jil9Axrx27pyFeej3emicvlRAgiA84EoY5CaBPW52cpodk5ic twriuknqsSKSmFsDX25tDOWR0vvk2ooUA6mepWq3BES4MRZJJYsIRa1kJIF5UrGj789ekcYZ 9TJfCZwMwTXUKRbnjfsGUq6NuwXk4rFhPDeUQGstz96UkloFlJi28jgOAPlHYJ85wwD7Ne4f 7fD6hunLZSCucLcKNUHo46MISKI12IZSiJHHOZIFzhGq1CEWnKsYTL7LI84/zvUIAUzaE1hI /KXDpjxCIPknrVeIyzNaBwg1DwqD3XZ0Wv9ige3ek1hlTEfsukDcXZI2pe0fdJoJ0kc77msr iISddr6sTYXBrT8LZyrnLDsqZpWAcjue0uy6MGsgG107b2A7yvkNDnW9DuA5eoOQoYewrEcw s+tX7IVY990nw=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHY+zjv47cM7GZ23kWVhiI/91aGOa5EnVaAgAAGB4CAABnfgA==
  • Thread-topic: [PATCH] x86/HVM: don't mark evtchn upcall vector as pending when vLAPIC is disabled

On 18/11/2022 12:54, Jan Beulich wrote:
> On 18.11.2022 13:33, Andrew Cooper wrote:
>> On 18/11/2022 10:31, Jan Beulich wrote:
>>> Linux'es relatively new use of HVMOP_set_evtchn_upcall_vector has
>>> exposed a problem with the marking of the respective vector as
>>> pending: For quite some time Linux has been checking whether any stale
>>> ISR or IRR bits would still be set while preparing the LAPIC for use.
>>> This check is now triggering on the upcall vector, as the registration,
>>> at least for APs, happens before the LAPIC is actually enabled.
>>>
>>> In software-disabled state an LAPIC would not accept any interrupt
>>> requests and hence no IRR bit would newly become set while in this
>>> state. As a result it is also wrong for us to mark the upcall vector as
>>> having a pending request when the vLAPIC is in this state.
>> I agree with this.
>>
>>> To compensate for the "enabled" check added to the assertion logic, add
>>> logic to (conditionally) mark the upcall vector as having a request
>>> pending at the time the LAPIC is being software-enabled by the guest.
>> But this, I don't think is appropriate.
>>
>> The point of raising on enable is allegedly to work around setup race
>> conditions.  I'm unconvinced by this reasoning, but it is what it is,
>> and the stated behaviour is to raise there and then.
>>
>> If a guest enables evtchn while the LAPIC is disabled, then the
>> interrupt is lost.  Like every other interrupt in an x86 system.
> Edge triggered ones you mean, I suppose, but yes.

For IO-APIC systems, you mostly lose line interrupts too, don't you?

The line will remain pending at the IO-APIC, but nothing in the system
will unwedge until someone polls the IO-APIC.

Either way...

>
>> I don't think there is any credible way a guest kernel author can expect
>> the weird evtchn edgecase to wait for an arbitrary point in the future,
>> and it's a corner case that I think is worth not keeping.
> Well - did you look at 7b5b8ca7dffd ("x86/upcall: inject a spurious event
> after setting upcall vector"), referenced by the Fixes: tag? The issue is
> that with evtchn_upcall_pending once set, there would never again be a
> notification.

Ok, so we do need to do something.

>  So if what you say is to be the model we follow, then that
> earlier change was perhaps wrong as well. Instead it should then have
> been a guest change (as also implicit from your reply) to clear
> evtchn_upcall_pending after vCPU info registration (there) or LAPIC
> enabling (here), perhaps by way of "manually" invoking the handling of
> that pending event, or by issuing a self-IPI with that vector.
> Especially the LAPIC enabling case would then be yet another Xen-specific
> on a guest code path which better wouldn't have to be aware of Xen. 

Without trying to prescribe how to fix this specific issue, wherever
possible we should be trying to limit the Xen-isms from non-Xen areas. 
There's a whole lot of poorly described and surprising behaviours which
have not stood the test of time.

In this case, it seems that we have yet another x86 PV-ism which hasn't
translated well x86 HVM.  Specifically, we're trying to overlay an
entirely shared-memory (and delayed return-to-guest) interrupt
controller onto one which is properly constructed to handle events in
realtime.


I even got as far as writing that maybe leaving it as-is was the best
option (principle of least surprise for Xen developers), but our
"friend" apic acceleration strikes again.

Xen doesn't always get a VMExit when the guest clears SW_DISABLE,
because microcode may have accelerated it.

A consequence of this observation is that Xen cannot have
non-LAPIC-archtiectural behaviour in the vlapic emulation.  So I think
we need to find a solution to this problem that doesn't hook APIC_SPIV.

~Andrew

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.