Xen project Mailing List

Re: [PATCH v2 4/8] vpci: add a wait operation to the vpci vcpu pending actions

To: Mykyta Poturai <Mykyta_Poturai@xxxxxxxx>

Date: Wed, 1 Apr 2026 16:14:43 +0200

Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=google header.d=suse.com header.i="@suse.com" header.h="Content-Transfer-Encoding:In-Reply-To:Autocrypt:From:Content-Language:References:Cc:To:Subject:User-Agent:MIME-Version:Date:Message-ID"

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stewart Hildebrand <stewart.hildebrand@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 01 Apr 2026 14:14:49 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 01.04.2026 16:07, Mykyta Poturai wrote: > On 4/1/26 11:21, Jan Beulich wrote: >> On 01.04.2026 09:59, Mykyta Poturai wrote: >>> On 3/31/26 17:55, Jan Beulich wrote: >>>> On 09.03.2026 12:08, Mykyta Poturai wrote: >>>>> --- a/xen/drivers/vpci/header.c >>>>> +++ b/xen/drivers/vpci/header.c >>>>> @@ -175,76 +175,92 @@ static void modify_decoding(const struct pci_dev >>>>> *pdev, uint16_t cmd, >>>>> >>>>> bool vpci_process_pending(struct vcpu *v) >>>>> { >>>>> - const struct pci_dev *pdev = v->vpci.pdev; >>>>> - struct vpci_header *header = NULL; >>>>> - unsigned int i; >>>>> - >>>>> - if ( !pdev ) >>>>> - return false; >>>>> - >>>>> - read_lock(&v->domain->pci_lock); >>>>> - >>>>> - if ( !pdev->vpci || (v->domain != pdev->domain) ) >>>>> + switch ( v->vpci.task ) >>>>> { >>>>> - v->vpci.pdev = NULL; >>>>> - read_unlock(&v->domain->pci_lock); >>>>> - return false; >>>>> - } >>>>> - >>>>> - header = &pdev->vpci->header; >>>>> - for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >>>>> + case MODIFY_MEMORY: >>>>> { >>>>> - struct vpci_bar *bar = &header->bars[i]; >>>>> - struct rangeset *mem = v->vpci.bar_mem[i]; >>>>> - struct map_data data = { >>>>> - .d = v->domain, >>>>> - .map = v->vpci.cmd & PCI_COMMAND_MEMORY, >>>>> - .bar = bar, >>>>> - }; >>>>> - int rc; >>>>> + const struct pci_dev *pdev = v->vpci.memory.pdev; >>>>> + struct vpci_header *header = NULL; >>>>> + unsigned int i; >>>>> >>>>> - if ( rangeset_is_empty(mem) ) >>>>> - continue; >>>>> + if ( !pdev ) >>>>> + break; >>>>> >>>>> - rc = rangeset_consume_ranges(mem, map_range, &data); >>>>> + read_lock(&v->domain->pci_lock); >>>>> >>>>> - if ( rc == -ERESTART ) >>>>> + if ( !pdev->vpci || (v->domain != pdev->domain) ) >>>>> { >>>>> + v->vpci.memory.pdev = NULL; >>>>> read_unlock(&v->domain->pci_lock); >>>>> - return true; >>>>> + break; >>>>> } >>>>> >>>>> - if ( rc ) >>>>> + header = &pdev->vpci->header; >>>>> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >>>>> { >>>>> - spin_lock(&pdev->vpci->lock); >>>>> - /* Disable memory decoding unconditionally on failure. */ >>>>> - modify_decoding(pdev, v->vpci.cmd & ~PCI_COMMAND_MEMORY, >>>>> - false); >>>>> - spin_unlock(&pdev->vpci->lock); >>>>> + struct vpci_bar *bar = &header->bars[i]; >>>>> + struct rangeset *mem = v->vpci.bar_mem[i]; >>>>> + struct map_data data = { >>>>> + .d = v->domain, >>>>> + .map = v->vpci.memory.cmd & PCI_COMMAND_MEMORY, >>>>> + .bar = bar, >>>>> + }; >>>>> + int rc; >>>>> + >>>>> + if ( rangeset_is_empty(mem) ) >>>>> + continue; >>>>> >>>>> - /* Clean all the rangesets */ >>>>> - for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >>>>> - if ( !rangeset_is_empty(v->vpci.bar_mem[i]) ) >>>>> - rangeset_purge(v->vpci.bar_mem[i]); >>>>> + rc = rangeset_consume_ranges(mem, map_range, &data); >>>>> >>>>> - v->vpci.pdev = NULL; >>>>> + if ( rc == -ERESTART ) >>>>> + { >>>>> + read_unlock(&v->domain->pci_lock); >>>>> + return true; >>>>> + } >>>>> >>>>> - read_unlock(&v->domain->pci_lock); >>>>> + if ( rc ) >>>>> + { >>>>> + spin_lock(&pdev->vpci->lock); >>>>> + /* Disable memory decoding unconditionally on failure. */ >>>>> + modify_decoding(pdev, v->vpci.memory.cmd & >>>>> ~PCI_COMMAND_MEMORY, >>>>> + false); >>>>> + spin_unlock(&pdev->vpci->lock); >>>>> + >>>>> + /* Clean all the rangesets */ >>>>> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >>>>> + if ( !rangeset_is_empty(v->vpci.bar_mem[i]) ) >>>>> + rangeset_purge(v->vpci.bar_mem[i]); >>>>> + >>>>> + v->vpci.memory.pdev = NULL; >>>>> + >>>>> + read_unlock(&v->domain->pci_lock); >>>>> >>>>> - if ( !is_hardware_domain(v->domain) ) >>>>> - domain_crash(v->domain); >>>>> + if ( !is_hardware_domain(v->domain) ) >>>>> + domain_crash(v->domain); >>>>> >>>>> - return false; >>>>> + break; >>>>> + } >>>>> } >>>>> - } >>>>> - v->vpci.pdev = NULL; >>>>> + v->vpci.memory.pdev = NULL; >>>>> >>>>> - spin_lock(&pdev->vpci->lock); >>>>> - modify_decoding(pdev, v->vpci.cmd, v->vpci.rom_only); >>>>> - spin_unlock(&pdev->vpci->lock); >>>>> + spin_lock(&pdev->vpci->lock); >>>>> + modify_decoding(pdev, v->vpci.memory.cmd, >>>>> v->vpci.memory.rom_only); >>>>> + spin_unlock(&pdev->vpci->lock); >>>>> >>>>> - read_unlock(&v->domain->pci_lock); >>>>> + read_unlock(&v->domain->pci_lock); >>>>> + >>>>> + break; >>>>> + } >>>>> + case WAIT: >>>>> + if ( NOW() < v->vpci.wait.end ) >>>>> + return true; >>>>> + v->vpci.wait.callback(v->vpci.wait.data); >>>>> + break; >>>> >>>> As just indicated in reply to patch 6, busy waiting isn't really >>>> acceptable. >>>> This is even more so when the waiting exceeds the typical length of a >>>> scheduling timeslice. >>>> >>>> In that other reply I said to put the vCPU to sleep, but you need to be >>>> careful >>>> there too: The domain may not expect its vCPU to not make any progress for >>>> such >>>> an extended period of time. This may need doing entirely differently: Once >>>> the >>>> command register was written, you may want to record the time after which >>>> accesses to the VF registers are permitted. Earlier accesses would simply >>>> be >>>> terminated. You may still additionally need a timer, in order to kick off >>>> BAR >>>> mapping after that time. (Yet better would be if the BAR mapping could be >>>> done during those 100ms. After all that may be a reason why this long a >>>> delay >>>> is specified: Firmware on the device may also require some time to set up >>>> the >>>> BARs accordingly.) >>> >>> I am not sure it would work that way. If we look at how linux >>> initialized sriov, it writes VFE and MSE bits, waits 100ms and then >>> expects VFs to be operational. If they are not operational at that >>> moment, then it considers the operation failed and removes all VFs. If >>> we also wait 100ms before enabling access, the probability of a guest >>> trying to access something before we allow it would be very high. >> >> Well, not really. Our counting of the 100ms necessarily starts before Dom0's. >> Furthermore it may be acceptable (or even appropriate) to stall premature >> accesses (because they shouldn't occur in the first place), by blocking the >> vCPU at that point. A middle route may be possible: Terminate accesses in, >> say, the first 90ms, and stall the vCPU for any access past that, but before >> the 100ms expired. > > Is there any real benefit to doing all this work instead of just waiting > for Dom0 to register the FVs? Implementing it the way you described > would require a relatively complex state machine and two timers per > sriov-capable device. And will also probably require some hacks to > handle partially initialized VFs in Xen. This adds a lot of work and > many possible bugs for not a lot of benefit in my opinion. Odd that you ask me this question. If there was no benefit, why did you do it this way? Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.