Xen project Mailing List

Re: [PATCH v3 5/7] vpci: add SR-IOV support for PVH Dom0

To: Mykyta Poturai <Mykyta_Poturai@xxxxxxxx>, Stewart Hildebrand <stewart.hildebrand@xxxxxxx>

Date: Tue, 21 Apr 2026 16:43:27 +0200

Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=google header.d=suse.com header.i="@suse.com" header.h="Content-Transfer-Encoding:In-Reply-To:Autocrypt:From:Content-Language:References:Cc:To:Subject:User-Agent:MIME-Version:Date:Message-ID"

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Tue, 21 Apr 2026 14:43:40 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 09.04.2026 16:01, Mykyta Poturai wrote: > From: Stewart Hildebrand <stewart.hildebrand@xxxxxxx> > > This code is expected to only be used by privileged domains, > unprivileged domains should not get access to the SR-IOV capability. > > Implement RW handlers for PCI_SRIOV_CTRL register to dynamically > map/unmap VF BARS. Recalculate BAR sizes before mapping VFs to account > for possible changes in the system page size register. Also force VFs to > always use emulated reads for command register, this is needed to > prevent some drivers accidentally unmapping BARs. This apparently refers to the change to vpci_init_header(). Writes are already intercepted. How would a read lead to accidental BAR unmap? Even for writes I don't see how a VF driver could accidentally unmap BARs, as the memory decode bit there is hardwired to 0. > Discovery of VFs is > done by Dom0, which must register them with Xen. If we intercept control register writes, why would we still require Dom0 to report the VFs that appear? > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx> > Signed-off-by: Mykyta Poturai <mykyta_poturai@xxxxxxxx> First S-o-b doesn't match From: - is that correct? > --- a/xen/drivers/vpci/Makefile > +++ b/xen/drivers/vpci/Makefile > @@ -1,3 +1,4 @@ > obj-y += cap.o > obj-y += vpci.o header.o rebar.o > +obj-y += sriov.o > obj-$(CONFIG_HAS_PCI_MSI) += msi.o msix.o I understand the one line with multiple objects is unhelpful here, but for new additions (like was the case with cap.o) I think we want to respect alphabetical ordering, excluding the one odd line. > --- /dev/null > +++ b/xen/drivers/vpci/sriov.c > @@ -0,0 +1,314 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* > + * Handlers for accesses to the SR-IOV capability structure. > + * > + * Copyright (C) 2026 Citrix Systems R&D > + */ > + > +#include <xen/sched.h> > +#include <xen/vpci.h> > +#include <xsm/xsm.h> > +#include "private.h" > + > +static int vf_init_bars(const struct pci_dev *vf_pdev) > +{ > + int vf_idx; > + unsigned int i; > + const struct pci_dev *pf_pdev = vf_pdev->pf_pdev; > + struct vpci_bar *bars = vf_pdev->vpci->header.bars; > + struct vpci_bar *physfn_vf_bars = pf_pdev->vpci->sriov->vf_bars; > + unsigned int sriov_pos = pci_find_ext_capability(pf_pdev, > + PCI_EXT_CAP_ID_SRIOV); > + uint16_t offset = pci_conf_read16(pf_pdev->sbdf, > + sriov_pos + PCI_SRIOV_VF_OFFSET); > + uint16_t stride = pci_conf_read16(pf_pdev->sbdf, > + sriov_pos + PCI_SRIOV_VF_STRIDE); > + > + vf_idx = vf_pdev->sbdf.sbdf - (pf_pdev->sbdf.sbdf + offset); > + if ( vf_idx < 0 ) > + return -EINVAL; Why is underflow checked for, but too large a value isn't? > + if ( stride ) > + { > + if ( vf_idx % stride ) > + return -EINVAL; > + vf_idx /= stride; > + } What if stride is 0 and there are multiple VFs? > + /* > + * Set up BARs for this VF out of PF's VF BARs taking into account > + * the index of the VF. > + */ > + for ( i = 0; i < PCI_SRIOV_NUM_BARS; i++ ) > + { > + bars[i].addr = physfn_vf_bars[i].addr + vf_idx * > physfn_vf_bars[i].size; > + bars[i].guest_addr = bars[i].addr; > + bars[i].size = physfn_vf_bars[i].size; > + bars[i].type = physfn_vf_bars[i].type; > + bars[i].prefetchable = physfn_vf_bars[i].prefetchable; This may end up quite a bit easier to read if &bar[i] and &physfn_vf_bars[i] were latched into local variables. It might further help if the = were all aligned with one another. > +static int map_vfs(const struct pci_dev *pf_pdev, uint16_t cmd) > +{ > + struct pci_dev *vf_pdev; > + int rc; > + > + ASSERT(rw_is_write_locked(&pf_pdev->domain->pci_lock)); > + > + list_for_each_entry(vf_pdev, &pf_pdev->vf_list, vf_list) > + { > + rc = vpci_modify_bars(vf_pdev, cmd, false); > + if ( rc ) > + { > + gprintk(XENLOG_ERR, "failed to %s VF %pp: %d\n", > + (cmd & PCI_COMMAND_MEMORY) ? "map" : "unmap", > + &vf_pdev->sbdf, rc); > + return rc; > + } > + > + vf_pdev->vpci->header.guest_cmd &= ~PCI_COMMAND_MEMORY; > + vf_pdev->vpci->header.guest_cmd |= (cmd & PCI_COMMAND_MEMORY); As mentioned elsewhere as well, this bit is supposed to be 0 for VFs. > +static void cf_check control_write(const struct pci_dev *pdev, unsigned int > reg, > + uint32_t val, void *data) > +{ > + unsigned int sriov_pos = reg - PCI_SRIOV_CTRL; > + struct vpci_sriov *sriov = pdev->vpci->sriov; > + uint16_t control = pci_conf_read16(pdev->sbdf, reg); > + bool mem_enabled = control & PCI_SRIOV_CTRL_MSE; > + bool new_mem_enabled = val & PCI_SRIOV_CTRL_MSE; > + bool enabled = control & PCI_SRIOV_CTRL_VFE; > + bool new_enabled = val & PCI_SRIOV_CTRL_VFE; > + int rc; > + > + ASSERT(!pdev->info.is_virtfn); > + > + if ( new_enabled == enabled && new_mem_enabled == mem_enabled ) > + { > + pci_conf_write16(pdev->sbdf, reg, val); > + return; > + } > + > + if ( mem_enabled && !new_mem_enabled ) > + map_vfs(pdev, 0); Why are {new_,}enabled not also taken into account here? > + if ( !enabled && new_enabled ) > + { > + size_vf_bars(pdev, sriov_pos, (uint64_t *)data); (The cast is generally unnecessary here. If you want to keep it, I think a comment is needed.) > + /* > + * Only update the number of active VFs when enabling, when > + * disabling use the cached value in order to always remove the same > + * number of VFs that were active. > + */ > + sriov->num_vfs = pci_conf_read16(pdev->sbdf, > + sriov_pos + PCI_SRIOV_NUM_VF); > + } > + > + if ( !mem_enabled && new_mem_enabled ) > + { > + rc = map_vfs(pdev, PCI_COMMAND_MEMORY); Same here. > + if ( rc ) > + map_vfs(pdev, 0); > + } > + > + pci_conf_write16(pdev->sbdf, reg, val); > +} > + > +int vpci_vf_init_header(struct pci_dev *vf_pdev) I think this would better move (up or down), to not sit in the middle of PF functions. > +{ > + const struct pci_dev *pf_pdev; > + unsigned int sriov_pos; > + int rc = 0; > + uint16_t ctrl; > + > + ASSERT(rw_is_write_locked(&vf_pdev->domain->pci_lock)); > + > + if ( !vf_pdev->info.is_virtfn ) > + return 0; Looking at the sole call site, this apparently wants to be an assertion. > + pf_pdev = vf_pdev->pf_pdev; > + ASSERT(pf_pdev); The caller doesn't guarantee this, does it? > + rc = vf_init_bars(vf_pdev); > + if ( rc ) > + return rc; > + > + sriov_pos = pci_find_ext_capability(pf_pdev, PCI_EXT_CAP_ID_SRIOV); > + ctrl = pci_conf_read16(pf_pdev->sbdf, sriov_pos + PCI_SRIOV_CTRL); > + > + if ( (pf_pdev->domain == vf_pdev->domain) && (ctrl & PCI_SRIOV_CTRL_MSE) > ) > + { > + rc = vpci_modify_bars(vf_pdev, PCI_COMMAND_MEMORY, false); Doesn't VF enable also need to be set for the BARs to be mapped? > + if ( rc ) > + return rc; > + > + vf_pdev->vpci->header.guest_cmd |= PCI_COMMAND_MEMORY; But the bit is required to be zero in the VF command register. > + } > + > + return rc; > +} > + > +static int cf_check init_sriov(struct pci_dev *pdev) > +{ > + unsigned int pos; > + > + ASSERT(!pdev->info.is_virtfn); Where's the check that prevents this from triggering? Aiui if vpci_init_capabilities() finds an SR-IOV capability in a VF's config space, it'll call here. > + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV); > + > + if ( !pos ) > + return 0; Can't this be an assertion? (In fact I was wondering why the caller, which already had to find the capability, doesn't pass the position.) > +static int cf_check cleanup_sriov(const struct pci_dev *pdev, bool hide) > +{ > + unsigned int pos; > + int rc; > + > + if ( !pdev->vpci->sriov ) > + return 0; > + > + ASSERT(!pdev->info.is_virtfn); > + ASSERT(list_empty(&pdev->vf_list)); What guarantees this latter property? > + if ( !hide ) > + { > + XFREE(pdev->vpci->sriov); > + return 0; > + } > + > + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV); As was explained in 9ebba62ca843 ("vPCI/ReBAR: improve cleanup") and d734babf8bc3 ("vPCI: re-init extended-capabilities when MMCFG availability changed"), you may not call this function during cleanup. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.