Xen project Mailing List

Re: [PATCH v2 4/8] vpci: add a wait operation to the vpci vcpu pending actions

To: Mykyta Poturai <Mykyta_Poturai@xxxxxxxx>

Date: Wed, 1 Apr 2026 16:44:08 +0200

Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=google header.d=suse.com header.i="@suse.com" header.h="Content-Transfer-Encoding:In-Reply-To:Autocrypt:From:Content-Language:References:Cc:To:Subject:User-Agent:MIME-Version:Date:Message-ID"

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stewart Hildebrand <stewart.hildebrand@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 01 Apr 2026 14:48:16 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 01.04.2026 16:40, Mykyta Poturai wrote: > > > On 4/1/26 17:14, Jan Beulich wrote: >> On 01.04.2026 16:07, Mykyta Poturai wrote: >>> On 4/1/26 11:21, Jan Beulich wrote: >>>> On 01.04.2026 09:59, Mykyta Poturai wrote: >>>>> On 3/31/26 17:55, Jan Beulich wrote: >>>>>> On 09.03.2026 12:08, Mykyta Poturai wrote: >>>>>>> --- a/xen/drivers/vpci/header.c >>>>>>> +++ b/xen/drivers/vpci/header.c >>>>>>> @@ -175,76 +175,92 @@ static void modify_decoding(const struct pci_dev >>>>>>> *pdev, uint16_t cmd, >>>>>>> >>>>>>> bool vpci_process_pending(struct vcpu *v) >>>>>>> { >>>>>>> - const struct pci_dev *pdev = v->vpci.pdev; >>>>>>> - struct vpci_header *header = NULL; >>>>>>> - unsigned int i; >>>>>>> - >>>>>>> - if ( !pdev ) >>>>>>> - return false; >>>>>>> - >>>>>>> - read_lock(&v->domain->pci_lock); >>>>>>> - >>>>>>> - if ( !pdev->vpci || (v->domain != pdev->domain) ) >>>>>>> + switch ( v->vpci.task ) >>>>>>> { >>>>>>> - v->vpci.pdev = NULL; >>>>>>> - read_unlock(&v->domain->pci_lock); >>>>>>> - return false; >>>>>>> - } >>>>>>> - >>>>>>> - header = &pdev->vpci->header; >>>>>>> - for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >>>>>>> + case MODIFY_MEMORY: >>>>>>> { >>>>>>> - struct vpci_bar *bar = &header->bars[i]; >>>>>>> - struct rangeset *mem = v->vpci.bar_mem[i]; >>>>>>> - struct map_data data = { >>>>>>> - .d = v->domain, >>>>>>> - .map = v->vpci.cmd & PCI_COMMAND_MEMORY, >>>>>>> - .bar = bar, >>>>>>> - }; >>>>>>> - int rc; >>>>>>> + const struct pci_dev *pdev = v->vpci.memory.pdev; >>>>>>> + struct vpci_header *header = NULL; >>>>>>> + unsigned int i; >>>>>>> >>>>>>> - if ( rangeset_is_empty(mem) ) >>>>>>> - continue; >>>>>>> + if ( !pdev ) >>>>>>> + break; >>>>>>> >>>>>>> - rc = rangeset_consume_ranges(mem, map_range, &data); >>>>>>> + read_lock(&v->domain->pci_lock); >>>>>>> >>>>>>> - if ( rc == -ERESTART ) >>>>>>> + if ( !pdev->vpci || (v->domain != pdev->domain) ) >>>>>>> { >>>>>>> + v->vpci.memory.pdev = NULL; >>>>>>> read_unlock(&v->domain->pci_lock); >>>>>>> - return true; >>>>>>> + break; >>>>>>> } >>>>>>> >>>>>>> - if ( rc ) >>>>>>> + header = &pdev->vpci->header; >>>>>>> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >>>>>>> { >>>>>>> - spin_lock(&pdev->vpci->lock); >>>>>>> - /* Disable memory decoding unconditionally on failure. */ >>>>>>> - modify_decoding(pdev, v->vpci.cmd & ~PCI_COMMAND_MEMORY, >>>>>>> - false); >>>>>>> - spin_unlock(&pdev->vpci->lock); >>>>>>> + struct vpci_bar *bar = &header->bars[i]; >>>>>>> + struct rangeset *mem = v->vpci.bar_mem[i]; >>>>>>> + struct map_data data = { >>>>>>> + .d = v->domain, >>>>>>> + .map = v->vpci.memory.cmd & PCI_COMMAND_MEMORY, >>>>>>> + .bar = bar, >>>>>>> + }; >>>>>>> + int rc; >>>>>>> + >>>>>>> + if ( rangeset_is_empty(mem) ) >>>>>>> + continue; >>>>>>> >>>>>>> - /* Clean all the rangesets */ >>>>>>> - for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >>>>>>> - if ( !rangeset_is_empty(v->vpci.bar_mem[i]) ) >>>>>>> - rangeset_purge(v->vpci.bar_mem[i]); >>>>>>> + rc = rangeset_consume_ranges(mem, map_range, &data); >>>>>>> >>>>>>> - v->vpci.pdev = NULL; >>>>>>> + if ( rc == -ERESTART ) >>>>>>> + { >>>>>>> + read_unlock(&v->domain->pci_lock); >>>>>>> + return true; >>>>>>> + } >>>>>>> >>>>>>> - read_unlock(&v->domain->pci_lock); >>>>>>> + if ( rc ) >>>>>>> + { >>>>>>> + spin_lock(&pdev->vpci->lock); >>>>>>> + /* Disable memory decoding unconditionally on failure. >>>>>>> */ >>>>>>> + modify_decoding(pdev, v->vpci.memory.cmd & >>>>>>> ~PCI_COMMAND_MEMORY, >>>>>>> + false); >>>>>>> + spin_unlock(&pdev->vpci->lock); >>>>>>> + >>>>>>> + /* Clean all the rangesets */ >>>>>>> + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >>>>>>> + if ( !rangeset_is_empty(v->vpci.bar_mem[i]) ) >>>>>>> + rangeset_purge(v->vpci.bar_mem[i]); >>>>>>> + >>>>>>> + v->vpci.memory.pdev = NULL; >>>>>>> + >>>>>>> + read_unlock(&v->domain->pci_lock); >>>>>>> >>>>>>> - if ( !is_hardware_domain(v->domain) ) >>>>>>> - domain_crash(v->domain); >>>>>>> + if ( !is_hardware_domain(v->domain) ) >>>>>>> + domain_crash(v->domain); >>>>>>> >>>>>>> - return false; >>>>>>> + break; >>>>>>> + } >>>>>>> } >>>>>>> - } >>>>>>> - v->vpci.pdev = NULL; >>>>>>> + v->vpci.memory.pdev = NULL; >>>>>>> >>>>>>> - spin_lock(&pdev->vpci->lock); >>>>>>> - modify_decoding(pdev, v->vpci.cmd, v->vpci.rom_only); >>>>>>> - spin_unlock(&pdev->vpci->lock); >>>>>>> + spin_lock(&pdev->vpci->lock); >>>>>>> + modify_decoding(pdev, v->vpci.memory.cmd, >>>>>>> v->vpci.memory.rom_only); >>>>>>> + spin_unlock(&pdev->vpci->lock); >>>>>>> >>>>>>> - read_unlock(&v->domain->pci_lock); >>>>>>> + read_unlock(&v->domain->pci_lock); >>>>>>> + >>>>>>> + break; >>>>>>> + } >>>>>>> + case WAIT: >>>>>>> + if ( NOW() < v->vpci.wait.end ) >>>>>>> + return true; >>>>>>> + v->vpci.wait.callback(v->vpci.wait.data); >>>>>>> + break; >>>>>> >>>>>> As just indicated in reply to patch 6, busy waiting isn't really >>>>>> acceptable. >>>>>> This is even more so when the waiting exceeds the typical length of a >>>>>> scheduling timeslice. >>>>>> >>>>>> In that other reply I said to put the vCPU to sleep, but you need to be >>>>>> careful >>>>>> there too: The domain may not expect its vCPU to not make any progress >>>>>> for such >>>>>> an extended period of time. This may need doing entirely differently: >>>>>> Once the >>>>>> command register was written, you may want to record the time after which >>>>>> accesses to the VF registers are permitted. Earlier accesses would >>>>>> simply be >>>>>> terminated. You may still additionally need a timer, in order to kick >>>>>> off BAR >>>>>> mapping after that time. (Yet better would be if the BAR mapping could >>>>>> be >>>>>> done during those 100ms. After all that may be a reason why this long a >>>>>> delay >>>>>> is specified: Firmware on the device may also require some time to set >>>>>> up the >>>>>> BARs accordingly.) >>>>> >>>>> I am not sure it would work that way. If we look at how linux >>>>> initialized sriov, it writes VFE and MSE bits, waits 100ms and then >>>>> expects VFs to be operational. If they are not operational at that >>>>> moment, then it considers the operation failed and removes all VFs. If >>>>> we also wait 100ms before enabling access, the probability of a guest >>>>> trying to access something before we allow it would be very high. >>>> >>>> Well, not really. Our counting of the 100ms necessarily starts before >>>> Dom0's. >>>> Furthermore it may be acceptable (or even appropriate) to stall premature >>>> accesses (because they shouldn't occur in the first place), by blocking the >>>> vCPU at that point. A middle route may be possible: Terminate accesses in, >>>> say, the first 90ms, and stall the vCPU for any access past that, but >>>> before >>>> the 100ms expired. >>> >>> Is there any real benefit to doing all this work instead of just waiting >>> for Dom0 to register the FVs? Implementing it the way you described >>> would require a relatively complex state machine and two timers per >>> sriov-capable device. And will also probably require some hacks to >>> handle partially initialized VFs in Xen. This adds a lot of work and >>> many possible bugs for not a lot of benefit in my opinion. >> >> Odd that you ask me this question. If there was no benefit, why did you do >> it this way? > > Roger asked for this approach in V1, and I saw that it can be done in a > relatively straightforward way, so I implemented it. I didn’t exactly > get what the benefits were, but I assumed that there are some and the > effort is not too big to just do it if the maintainers are asking for it. > > Now with the posibility of redoing everything *again* and making it much > more complex I started to really think if its really needed. So I want > to know your and others' opinions on registering VFs with Dom0. Well, before I voice any view here I'd need to understand why Roger wanted it done like that. (Apparently you must have agreed sufficiently to not ask back.) Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.