[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 06/10] vpci: Make every domain handle its own BARs


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Fri, 13 Nov 2020 06:48:20 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XuwBqXtQEZkLSOCgwUjKf6uPweQs9Hu8jkJFcLAGfi8=; b=CqjThm0WvzxDS2W1BZfsI7Jmf9+eHOZxUMyQFKNipsgsz1xwbde36GfCafDq5mBylNyR6/1TcrQJ8iVlCJ8ah/h7qdlgs/p/UHUEN8J8DjYKiyutk8eN0oPqJMo3IVGcxnFz80jgi6JG4S0ge1U3YE0Pt19bYim9yPOrC2dej92/4Jv7jz7yuh3xKWZS7Y1RCVGCTqDjR5BiwWi8E2narwijHx3I36CX4ih5m8MOL74aNLdMJN51M/Y0wpqlt5lfRVtxkrAyy+5wAiVPJYPc8hhhTX6KJ3gJooHFNg03284X+aO1+62OuEJ41REpUO4SuzJZWf/npixCQB9ZiwRpmw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IxHT28dLifrcnZtUbDRbd54v20XQFwySt5KGt1qSMnooRpPGk971leQol+wFEaT9HzM5v2bpRb9nFvBO1z/16jRX1OE3ffAjb9GdVC5ClsS10l4NR/vCo6qp7qtaHcMtOIW99AXKMuW7ceLJjBLb6zSAfNk3ZFFa+p7qY5Y6AvuXQ4Wyuw/JGOlDGJTZa2jHylZTblsH04oWiEekrLXI4PlmaaHq8jMM7Q4Qn9depm69yi7vGk6soGUKlzS+FBqQGoRlTL3zGocPmO/khlL1ecqOSvtsD2AfEUgInXkL7EWGjUoH0lUtqvmNxkeJTZrht9SUXnGi+5Z6RIV4wl0Ddg==
  • Authentication-results: citrix.com; dkim=none (message not signed) header.d=none;citrix.com; dmarc=none action=none header.from=epam.com;
  • Cc: Oleksandr Andrushchenko <andr2000@xxxxxxxxx>, "Rahul.Singh@xxxxxxx" <Rahul.Singh@xxxxxxx>, "Bertrand.Marquis@xxxxxxx" <Bertrand.Marquis@xxxxxxx>, "julien.grall@xxxxxxx" <julien.grall@xxxxxxx>, "jbeulich@xxxxxxxx" <jbeulich@xxxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "iwj@xxxxxxxxxxxxxx" <iwj@xxxxxxxxxxxxxx>, "wl@xxxxxxx" <wl@xxxxxxx>
  • Delivery-date: Fri, 13 Nov 2020 06:48:39 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHWtpbzyJg77SbpvkSRWLKjWy/3PanEQmgAgAA8YoCAABlOgIABCD8AgAAEbQA=
  • Thread-topic: [PATCH 06/10] vpci: Make every domain handle its own BARs

On 11/13/20 8:32 AM, Oleksandr Andrushchenko wrote:
>
> On 11/12/20 4:46 PM, Roger Pau Monné wrote:
>> On Thu, Nov 12, 2020 at 01:16:10PM +0000, Oleksandr Andrushchenko wrote:
>>> On 11/12/20 11:40 AM, Roger Pau Monné wrote:
>>>> On Mon, Nov 09, 2020 at 02:50:27PM +0200, Oleksandr Andrushchenko wrote:
>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>>>>> index f74f728884c0..7dc7c70e24f2 100644
>>>>> --- a/xen/drivers/vpci/header.c
>>>>> +++ b/xen/drivers/vpci/header.c
>>>>> @@ -31,14 +31,87 @@
>>>>>    struct map_data {
>>>>>        struct domain *d;
>>>>>        bool map;
>>>>> +    struct pci_dev *pdev;
>>>> If the field is required please place it after the domain one.
>>> I will, but may I ask why?
>> So that if we add further boolean fields we can do at the end of the
>> struct for layout reasons. If we do:
>>
>> struct map_data {
>>      struct domain *d;
>>      bool map;
>>      struct pci_dev *pdev;
>>      bool foo;
>> }
>>
>> We will end up with a bunch of padding that could be avoided by doing:
>>
>> struct map_data {
>>      struct domain *d;
>>      struct pci_dev *pdev;
>>      bool map;
>>      bool foo;
>> }
> Ah, so this is about padding. Got it
>>
>>>>> +    s = PFN_DOWN(s);
>>>>> +    e = PFN_DOWN(e);
>>>> Changing the rangeset to store memory addresses instead of frames
>>>> could for example be split into a separate patch.
>>> Ok
>>>> I think you are doing the calculation of the end pfn wrong here, you
>>>> should use PFN_UP instead in case the address is not aligned.
>>> PFN_DOWN for the start seems to be ok if the address is not aligned
>>>
>>> which is the case if I pass bar_index in the lower bits: PCI memory has
>>>
>>> PAGE_SIZE granularity, so besides the fact that I use bar_index the address
>> No, BARs don't need to be aligned to page boundaries, you can even
>> have different BARs inside the same physical page.
>>
>> The spec recommends that the minimum size of a BAR should be 4KB, but
>> that's not a strict requirement in which case a BAR can be as small as
>> 16bytes, and then you can have multiple ones inside the same page.
> Ok, will account on that
>>
>>> must be page aligned.
>>>
>>> The end address is expressed in (size - 1) form, again page aligned,
>>>
>>> so to get the last page to be mapped PFN_DOWN also seems to be appropriate.
>>>
>>> Do I miss something here?
>> I'm not aware of any  of those addresses or sizes being guaranteed to
>> be page aligned, so I think you need to account for that.
>>
>> Some of the code here uses PFN_DOWN to calculate the end address
>> because the rangesets are used in an inclusive fashion, so the end
>> frame also gets mapped.
> Ok
>>
>>>>> +    mfn = _mfn(PFN_DOWN(header->bars[bar_idx].addr));
>>>>>        for ( ; ; )
>>>>>        {
>>>>>            unsigned long size = e - s + 1;
>>>>> @@ -52,11 +125,15 @@ static int map_range(unsigned long s, unsigned long 
>>>>> e, void *data,
>>>>>             * - {un}map_mmio_regions doesn't support preemption.
>>>>>             */
>>>>>    -        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, 
>>>>> _mfn(s))
>>>>> -                      : unmap_mmio_regions(map->d, _gfn(s), size, 
>>>>> _mfn(s));
>>>>> +        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, mfn)
>>>>> +                      : unmap_mmio_regions(map->d, _gfn(s), size, mfn);
>>>>>            if ( rc == 0 )
>>>>>            {
>>>>> -            *c += size;
>>>>> +            /*
>>>>> +             * Range set is not expressed in frame numbers and the size
>>>>> +             * is the number of frames, so update accordingly.
>>>>> +             */
>>>>> +            *c += size << PAGE_SHIFT;
>>>>>                break;
>>>>>            }
>>>>>            if ( rc < 0 )
>>>>> @@ -67,8 +144,9 @@ static int map_range(unsigned long s, unsigned long e, 
>>>>> void *data,
>>>>>                break;
>>>>>            }
>>>>>            ASSERT(rc < size);
>>>>> -        *c += rc;
>>>>> +        *c += rc << PAGE_SHIFT;
>>>>>            s += rc;
>>>>> +        mfn += rc;
>>>>>            if ( general_preempt_check() )
>>>>>                    return -ERESTART;
>>>>>        }
>>>>> @@ -84,7 +162,7 @@ static int map_range(unsigned long s, unsigned long e, 
>>>>> void *data,
>>>>>    static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>>>>>                                bool rom_only)
>>>>>    {
>>>>> -    struct vpci_header *header = &pdev->vpci->header;
>>>>> +    struct vpci_header *header = get_hwdom_vpci_header(pdev);
>>>>>        bool map = cmd & PCI_COMMAND_MEMORY;
>>>>>        unsigned int i;
>>>>>    @@ -136,6 +214,7 @@ bool vpci_process_pending(struct vcpu *v)
>>>>>            struct map_data data = {
>>>>>                .d = v->domain,
>>>>>                .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
>>>>> +            .pdev = v->vpci.pdev,
>>>>>            };
>>>>>            int rc = rangeset_consume_ranges(v->vpci.mem, map_range, 
>>>>> &data);
>>>>>    @@ -168,7 +247,8 @@ bool vpci_process_pending(struct vcpu *v)
>>>>>    static int __init apply_map(struct domain *d, const struct pci_dev 
>>>>> *pdev,
>>>>>                                struct rangeset *mem, uint16_t cmd)
>>>>>    {
>>>>> -    struct map_data data = { .d = d, .map = true };
>>>>> +    struct map_data data = { .d = d, .map = true,
>>>>> +        .pdev = (struct pci_dev *)pdev };
>>>> Dropping the const here is not fine. IT either needs to be dropped
>>>> from apply_map and further up, or this needs to also be made const.
>>> Ok, I'll try to keep it const
>>>>>        int rc;
>>>>>           while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) 
>>>>> == -ERESTART )
>>>>> @@ -205,7 +285,7 @@ static void defer_map(struct domain *d, struct 
>>>>> pci_dev *pdev,
>>>>>       static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, 
>>>>> bool rom_only)
>>>>>    {
>>>>> -    struct vpci_header *header = &pdev->vpci->header;
>>>>> +    struct vpci_header *header;
>>>>>        struct rangeset *mem = rangeset_new(NULL, NULL, 0);
>>>>>        struct pci_dev *tmp, *dev = NULL;
>>>>>    #ifdef CONFIG_X86
>>>>> @@ -217,6 +297,11 @@ static int modify_bars(const struct pci_dev *pdev, 
>>>>> uint16_t cmd, bool rom_only)
>>>>>        if ( !mem )
>>>>>            return -ENOMEM;
>>>>>    +    if ( is_hardware_domain(current->domain) )
>>>>> +        header = get_hwdom_vpci_header(pdev);
>>>>> +    else
>>>>> +        header = get_vpci_header(current->domain, pdev);
>>>>> +
>>>>>        /*
>>>>>         * Create a rangeset that represents the current device BARs 
>>>>> memory region
>>>>>         * and compare it against all the currently active BAR memory 
>>>>> regions. If
>>>>> @@ -225,12 +310,15 @@ static int modify_bars(const struct pci_dev *pdev, 
>>>>> uint16_t cmd, bool rom_only)
>>>>>         * First fill the rangeset with all the BARs of this device or 
>>>>> with the ROM
>>>>>         * BAR only, depending on whether the guest is toggling the memory 
>>>>> decode
>>>>>         * bit of the command register, or the enable bit of the ROM BAR 
>>>>> register.
>>>>> +     *
>>>>> +     * Use the PCI reserved bits of the BAR to pass BAR's index.
>>>>>         */
>>>>>        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>>>>>        {
>>>>>            const struct vpci_bar *bar = &header->bars[i];
>>>>> -        unsigned long start = PFN_DOWN(bar->addr);
>>>>> -        unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>>>>> +        unsigned long start = (bar->addr & PCI_BASE_ADDRESS_MEM_MASK) | 
>>>>> i;
>>>>> +        unsigned long end = (bar->addr & PCI_BASE_ADDRESS_MEM_MASK) +
>>>>> +            bar->size - 1;
>>>> Will this work fine on Arm 32bits with LPAE? It's my understanding
>>>> that in that case unsigned long is 32bits, but the physical address
>>>> space is 44bits, in which case this won't work.
>>> Hm, good question
>>>> I think you need to keep the usage of frame numbers here.
>>> If I re-work the gfn <-> mfn mapping then yes, I can use frame numbers here 
>>> and elsewhere
>>>>>               if ( !MAPPABLE_BAR(bar) ||
>>>>>                 (rom_only ? bar->type != VPCI_BAR_ROM
>>>>> @@ -251,9 +339,11 @@ static int modify_bars(const struct pci_dev *pdev, 
>>>>> uint16_t cmd, bool rom_only)
>>>>>        /* Remove any MSIX regions if present. */
>>>>>        for ( i = 0; msix && i < ARRAY_SIZE(msix->tables); i++ )
>>>>>        {
>>>>> -        unsigned long start = PFN_DOWN(vmsix_table_addr(pdev->vpci, i));
>>>>> -        unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
>>>>> - vmsix_table_size(pdev->vpci, i) - 1);
>>>>> +        unsigned long start = (vmsix_table_addr(pdev->vpci, i) &
>>>>> +                               PCI_BASE_ADDRESS_MEM_MASK) | i;
>>>>> +        unsigned long end = (vmsix_table_addr(pdev->vpci, i) &
>>>>> +                             PCI_BASE_ADDRESS_MEM_MASK ) +
>>>>> + vmsix_table_size(pdev->vpci, i) - 1;
>>>>>               rc = rangeset_remove_range(mem, start, end);
>>>>>            if ( rc )
>>>>> @@ -273,6 +363,8 @@ static int modify_bars(const struct pci_dev *pdev, 
>>>>> uint16_t cmd, bool rom_only)
>>>>>         */
>>>>>        for_each_pdev ( pdev->domain, tmp )
>>>>>        {
>>>>> +        struct vpci_header *header;
>>>>> +
>>>>>            if ( tmp == pdev )
>>>>>            {
>>>>>                /*
>>>>> @@ -289,11 +381,14 @@ static int modify_bars(const struct pci_dev *pdev, 
>>>>> uint16_t cmd, bool rom_only)
>>>>>                    continue;
>>>>>            }
>>>>>    -        for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>>>>> +        header = get_vpci_header(tmp->domain, pdev);
>>>>> +
>>>>> +        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>>>>>            {
>>>>> -            const struct vpci_bar *bar = &tmp->vpci->header.bars[i];
>>>>> -            unsigned long start = PFN_DOWN(bar->addr);
>>>>> -            unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>>>>> +            const struct vpci_bar *bar = &header->bars[i];
>>>>> +            unsigned long start = (bar->addr & 
>>>>> PCI_BASE_ADDRESS_MEM_MASK) | i;
>>>>> +            unsigned long end = (bar->addr & PCI_BASE_ADDRESS_MEM_MASK)
>>>>> +                + bar->size - 1;
>>>>>                   if ( !bar->enabled || !rangeset_overlaps_range(mem, 
>>>>> start, end) ||
>>>>>                     /*
>>>>> @@ -357,7 +452,7 @@ static void cmd_write(const struct pci_dev *pdev, 
>>>>> unsigned int reg,
>>>>>            pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>>    }
>>>>>    -static void bar_write(const struct pci_dev *pdev, unsigned int reg,
>>>>> +static void bar_write_hwdom(const struct pci_dev *pdev, unsigned int reg,
>>>>>                          uint32_t val, void *data)
>>>>>    {
>>>>>        struct vpci_bar *bar = data;
>>>>> @@ -377,14 +472,17 @@ static void bar_write(const struct pci_dev *pdev, 
>>>>> unsigned int reg,
>>>>>        {
>>>>>            /* If the value written is the current one avoid printing a 
>>>>> warning. */
>>>>>            if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
>>>>> +        {
>>>>> +            struct vpci_header *header = get_hwdom_vpci_header(pdev);
>>>>> +
>>>>>                gprintk(XENLOG_WARNING,
>>>>>                        "%04x:%02x:%02x.%u: ignored BAR %lu write with 
>>>>> memory decoding enabled\n",
>>>>>                        pdev->seg, pdev->bus, slot, func,
>>>>> -                    bar - pdev->vpci->header.bars + hi);
>>>>> +                    bar - header->bars + hi);
>>>>> +        }
>>>>>            return;
>>>>>        }
>>>>>    -
>>>>>        /*
>>>>>         * Update the cached address, so that when memory decoding is 
>>>>> enabled
>>>>>         * Xen can map the BAR into the guest p2m.
>>>>> @@ -403,10 +501,89 @@ static void bar_write(const struct pci_dev *pdev, 
>>>>> unsigned int reg,
>>>>>        pci_conf_write32(pdev->sbdf, reg, val);
>>>>>    }
>>>>>    +static uint32_t bar_read_hwdom(const struct pci_dev *pdev, unsigned 
>>>>> int reg,
>>>>> +                               void *data)
>>>>> +{
>>>>> +    return vpci_hw_read32(pdev, reg, data);
>>>>> +}
>>>>> +
>>>>> +static void bar_write_guest(const struct pci_dev *pdev, unsigned int reg,
>>>>> +                            uint32_t val, void *data)
>>>>> +{
>>>>> +    struct vpci_bar *vbar = data;
>>>>> +    bool hi = false;
>>>>> +
>>>>> +    if ( vbar->type == VPCI_BAR_MEM64_HI )
>>>>> +    {
>>>>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>>>>> +        vbar--;
>>>>> +        hi = true;
>>>>> +    }
>>>>> +    vbar->addr &= ~(0xffffffffull << (hi ? 32 : 0));
>>>>> +    vbar->addr |= (uint64_t)val << (hi ? 32 : 0);
>>>>> +}
>>>>> +
>>>>> +static uint32_t bar_read_guest(const struct pci_dev *pdev, unsigned int 
>>>>> reg,
>>>>> +                               void *data)
>>>>> +{
>>>>> +    struct vpci_bar *vbar = data;
>>>>> +    uint32_t val;
>>>>> +    bool hi = false;
>>>>> +
>>>>> +    if ( vbar->type == VPCI_BAR_MEM64_HI )
>>>>> +    {
>>>>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>>>>> +        vbar--;
>>>>> +        hi = true;
>>>>> +    }
>>>>> +
>>>>> +    if ( vbar->type == VPCI_BAR_MEM64_LO || vbar->type == 
>>>>> VPCI_BAR_MEM64_HI )
>>>> I think this would be clearer using a switch statement.
>>> I'll think about
>>>>> +    {
>>>>> +        if ( hi )
>>>>> +            val = vbar->addr >> 32;
>>>>> +        else
>>>>> +            val = vbar->addr & 0xffffffff;
>>>>> +        if ( val == ~0 )
>>>> Strictly speaking I think you are not forced to write 1s to the
>>>> reserved 4 bits in the low register (and in the 32bit case).
>>> Ah, so Linux kernel, for instance, could have written 0xffffff0 while
>>>
>>> I expect 0xffffffff?
>> I think real hardware would return the size when written 1s to all
>> bits except the reserved ones.
>>
>>>>> +        {
>>>>> +            /* Guests detects BAR's properties and sizes. */
>>>>> +            if ( !hi )
>>>>> +            {
>>>>> +                val = 0xffffffff & ~(vbar->size - 1);
>>>>> +                val |= vbar->type == VPCI_BAR_MEM32 ? 
>>>>> PCI_BASE_ADDRESS_MEM_TYPE_32
>>>>> +                                                    : 
>>>>> PCI_BASE_ADDRESS_MEM_TYPE_64;
>>>>> +                val |= vbar->prefetchable ? 
>>>>> PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
>>>>> +            }
>>>>> +            else
>>>>> +                val = vbar->size >> 32;
>>>>> +            vbar->addr &= ~(0xffffffffull << (hi ? 32 : 0));
>>>>> +            vbar->addr |= (uint64_t)val << (hi ? 32 : 0);
>>>>> +        }
>>>>> +    }
>>>>> +    else if ( vbar->type == VPCI_BAR_MEM32 )
>>>>> +    {
>>>>> +        val = vbar->addr;
>>>>> +        if ( val == ~0 )
>>>>> +        {
>>>>> +            if ( !hi )
>>>> There's no way hi can be true at this point AFAICT.
>>> Sure, thank you
>>>>> +            {
>>>>> +                val = 0xffffffff & ~(vbar->size - 1);
>>>>> +                val |= vbar->type == VPCI_BAR_MEM32 ? 
>>>>> PCI_BASE_ADDRESS_MEM_TYPE_32
>>>>> +                                                    : 
>>>>> PCI_BASE_ADDRESS_MEM_TYPE_64;
>>>>> +                val |= vbar->prefetchable ? 
>>>>> PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +    else
>>>>> +    {
>>>>> +        val = vbar->addr;
>>>>> +    }
>>>>> +    return val;
>>>>> +}
>>>>> +
>>>>>    static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>>>>>                          uint32_t val, void *data)
>>>>>    {
>>>>> -    struct vpci_header *header = &pdev->vpci->header;
>>>>> +    struct vpci_header *header = get_hwdom_vpci_header(pdev);
>>>>>        struct vpci_bar *rom = data;
>>>>>        uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
>>>>>        uint16_t cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
>>>>> @@ -452,15 +629,56 @@ static void rom_write(const struct pci_dev *pdev, 
>>>>> unsigned int reg,
>>>>>            rom->addr = val & PCI_ROM_ADDRESS_MASK;
>>>>>    }
>>>> Don't you need to also protect a domU from writing to the ROM BAR
>>>> register?
>>> ROM was not a target of this RFC as I have no HW to test that, but final 
>>> code must
>>>
>>> also handle ROM as well, you are right
>>>
>>>>>    +static uint32_t bar_read_dispatch(const struct pci_dev *pdev, 
>>>>> unsigned int reg,
>>>>> +                                  void *data)
>>>>> +{
>>>>> +    struct vpci_bar *vbar, *bar = data;
>>>>> +
>>>>> +    if ( is_hardware_domain(current->domain) )
>>>>> +        return bar_read_hwdom(pdev, reg, data);
>>>>> +
>>>>> +    vbar = get_vpci_bar(current->domain, pdev, bar->index);
>>>>> +    if ( !vbar )
>>>>> +        return ~0;
>>>>> +
>>>>> +    return bar_read_guest(pdev, reg, vbar);
>>>>> +}
>>>>> +
>>>>> +static void bar_write_dispatch(const struct pci_dev *pdev, unsigned int 
>>>>> reg,
>>>>> +                               uint32_t val, void *data)
>>>>> +{
>>>>> +    struct vpci_bar *bar = data;
>>>>> +
>>>>> +    if ( is_hardware_domain(current->domain) )
>>>>> +        bar_write_hwdom(pdev, reg, val, data);
>>>>> +    else
>>>>> +    {
>>>>> +        struct vpci_bar *vbar = get_vpci_bar(current->domain, pdev, 
>>>>> bar->index);
>>>>> +
>>>>> +        if ( !vbar )
>>>>> +            return;
>>>>> +        bar_write_guest(pdev, reg, val, vbar);
>>>>> +    }
>>>>> +}
>>>> You should assign different handlers based on whether the domain that
>>>> has the device assigned is a domU or the hardware domain, rather than
>>>> doing the selection here.
>>> Hm, handlers are assigned once in init_bars and this function is only called
>>>
>>> for hwdom, so there is no way I can do that for the guests. Hence, the 
>>> dispatcher
>> I think we might want to reset the vPCI handlers when a devices gets
>> assigned and deassigned.
>
> In ARM case init_bars is called too early: PCI device assignment is currently
>
> initiated by Domain-0' kernel and is done *before* PCI devices are given 
> memory
>
> ranges and BARs assigned:
>
> [    0.429514] pci_bus 0000:00: root bus resource [bus 00-ff]
> [    0.429532] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff]
> [    0.429555] pci_bus 0000:00: root bus resource [mem 0xfe200000-0xfe3fffff]
> [    0.429575] pci_bus 0000:00: root bus resource [mem 0x30000000-0x37ffffff]
> [    0.429604] pci_bus 0000:00: root bus resource [mem 0x38000000-0x3fffffff 
> pref]
> [    0.429670] pci 0000:00:00.0: enabling Extended Tags
> [    0.453764] pci 0000:00:00.0: -------------------- BUS_NOTIFY_ADD_DEVICE
>
> < init_bars >
>
> [    0.453793] pci 0000:00:00.0: -- IRQ 0
> [    0.458825] pci 0000:00:00.0: Failed to add - passthrough or MSI/MSI-X 
> might fail!
> [    0.471790] pci 0000:01:00.0: -------------------- BUS_NOTIFY_ADD_DEVICE
>
> < init_bars >
>
> [    0.471821] pci 0000:01:00.0: -- IRQ 255
> [    0.476809] pci 0000:01:00.0: Failed to add - passthrough or MSI/MSI-X 
> might fail!
>
> < BAR assignments below >
>
> [    0.488233] pci 0000:00:00.0: BAR 14: assigned [mem 0xfe200000-0xfe2fffff]
> [    0.488265] pci 0000:00:00.0: BAR 15: assigned [mem 0x38000000-0x380fffff 
> pref]
>
> In case of x86 this is pretty much ok as BARs are already in place, but for 
> ARM we
>
> need to take care and re-setup vPCI BARs for hwdom. Things are getting even 
> more
>
> complicated if the host PCI bridge is not ECAM like, so you cannot set 
> mmio_handlers
>
> and trap hwdom's access to the config space to update BARs etc. This is why I 
> have that
>
> ugly hack for rcar_gen3 to read actual BARs for hwdom.
>
>
> If we go further and take a look at SR-IOV then when the kernel assigns the 
> device
>
> (BUS_NOTIFY_ADD_DEVICE) then it already has BARs assigned for virtual 
> functions
>
> (need to double-check that).

Hm, indeed. We just need to move init_bars from being called on PCI *device 
add* to

*device assign*. This way it won't (?) break x86 and allow ARM to properly 
initialize vPCI's

BARs...

>
>>   In order to do passthrough to domUs safely
>> we will have to add more handlers than what's required for dom0,
> Can you please tell what are thinking about? What are the missing handlers?
>>   and
>> having is_hardware_domain sprinkled in all of them is not a suitable
>> solution.
>
> I'll try to replace is_hardware_domain with something like:
>
> +/*
> + * Detect whether physical PCI devices in this segment belong
> + * to the domain given, e.g. on x86 all PCI devices live in hwdom,
> + * but in case of ARM this might not be the case: those may also
> + * live in driver domains or even Xen itself.
> + */
> +bool pci_is_hardware_domain(struct domain *d, u16 seg)
> +{
> +#ifdef CONFIG_X86
> +    return is_hardware_domain(d);
> +#elif CONFIG_ARM
> +    return pci_is_owner_domain(d, seg);
> +#else
> +#error "Unsupported architecture"
> +#endif
> +}
> +
> +/*
> + * Get domain which owns this segment: for x86 this is always hardware
> + * domain and for ARM this can be different.
> + */
> +struct domain *pci_get_hardware_domain(u16 seg)
> +{
> +#ifdef CONFIG_X86
> +    return hardware_domain;
> +#elif CONFIG_ARM
> +    return pci_get_owner_domain(seg);
> +#else
> +#error "Unsupported architecture"
> +#endif
> +}
>
> This is what I use to properly detect the domain that really owns physical 
> host bridge
>
>>
>> Roger.
>
> Thank you,
>
> Oleksandr
>

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.