[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 06/10] vpci: Make every domain handle its own BARs


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Fri, 13 Nov 2020 06:32:31 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DYZlW9li2CbYarknfrEqLhplrYGRY1b5k0m7RNhpF9Y=; b=OU6FMPNPyiQcleyFg8V/xiMrwIIdPUSpUYg4Lwdq90yGCjlzIEA3XoZxL1jFbbU96oWU8ZcaqBHZ69sEalSeAEQH8Gh6aCuZEvNrJllJ/Ncva9o2TNi4c1hYWz8vrKzqnSYeFzknvBPX20sYhXs1bbPWLn99LHYlo+CMGDvj+ohmA8Cl2UE+pTXbWAL/FkIsCGGK3Ux2VDAWyNCfXm2tF+/nFvObsijwpSUd6VW1aYBWUOnkS172iYmreu4jmtxfga5+ahdPgaq50Mes/vTx6LwdfwII2ZZ6oSlCYYcQ48QvqFmoOizuSm8+49LmedbMrsyktuaCHDL+3DbZl5EfPA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QfxnJA7iPT9nHJpb+SjpsuzVU0hh9i2SbmLh+UfPI+UUYxkoPKrfX/Cwnp0m6HsRYtUWJK9u1iRMkGC5JOiDkbxL3ODzt/nyUGLLNCwSIRDBVTANC0el78R2+UMgaXPEXhQlYW0ufpMzgJCnqLjM8r+7YtnJfJ59hS+Y9T08MzgSg/wC7ekPJUfsB0HK0DR18nv7m2cK4mDXV5gL6Tpmt+EFd/TpSbB2lnl+HclhSFE0EY7ore7ASijAX6DhK5avwbOk3vXSfwL2tL642ossUYP0NEA7ZfMl5QwA205n5/VllqR8b91OkxIcmtg674wRGdX7EpnPIjHKTSWIRHW6Vg==
  • Authentication-results: citrix.com; dkim=none (message not signed) header.d=none;citrix.com; dmarc=none action=none header.from=epam.com;
  • Cc: Oleksandr Andrushchenko <andr2000@xxxxxxxxx>, "Rahul.Singh@xxxxxxx" <Rahul.Singh@xxxxxxx>, "Bertrand.Marquis@xxxxxxx" <Bertrand.Marquis@xxxxxxx>, "julien.grall@xxxxxxx" <julien.grall@xxxxxxx>, "jbeulich@xxxxxxxx" <jbeulich@xxxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "iwj@xxxxxxxxxxxxxx" <iwj@xxxxxxxxxxxxxx>, "wl@xxxxxxx" <wl@xxxxxxx>
  • Delivery-date: Fri, 13 Nov 2020 06:33:04 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHWtpbzyJg77SbpvkSRWLKjWy/3PanEQmgAgAA8YoCAABlOgIABCD8A
  • Thread-topic: [PATCH 06/10] vpci: Make every domain handle its own BARs

On 11/12/20 4:46 PM, Roger Pau Monné wrote:
> On Thu, Nov 12, 2020 at 01:16:10PM +0000, Oleksandr Andrushchenko wrote:
>> On 11/12/20 11:40 AM, Roger Pau Monné wrote:
>>> On Mon, Nov 09, 2020 at 02:50:27PM +0200, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>>>> index f74f728884c0..7dc7c70e24f2 100644
>>>> --- a/xen/drivers/vpci/header.c
>>>> +++ b/xen/drivers/vpci/header.c
>>>> @@ -31,14 +31,87 @@
>>>>    struct map_data {
>>>>        struct domain *d;
>>>>        bool map;
>>>> +    struct pci_dev *pdev;
>>> If the field is required please place it after the domain one.
>> I will, but may I ask why?
> So that if we add further boolean fields we can do at the end of the
> struct for layout reasons. If we do:
>
> struct map_data {
>      struct domain *d;
>      bool map;
>      struct pci_dev *pdev;
>      bool foo;
> }
>
> We will end up with a bunch of padding that could be avoided by doing:
>
> struct map_data {
>      struct domain *d;
>      struct pci_dev *pdev;
>      bool map;
>      bool foo;
> }
Ah, so this is about padding. Got it
>
>>>> +    s = PFN_DOWN(s);
>>>> +    e = PFN_DOWN(e);
>>> Changing the rangeset to store memory addresses instead of frames
>>> could for example be split into a separate patch.
>> Ok
>>> I think you are doing the calculation of the end pfn wrong here, you
>>> should use PFN_UP instead in case the address is not aligned.
>> PFN_DOWN for the start seems to be ok if the address is not aligned
>>
>> which is the case if I pass bar_index in the lower bits: PCI memory has
>>
>> PAGE_SIZE granularity, so besides the fact that I use bar_index the address
> No, BARs don't need to be aligned to page boundaries, you can even
> have different BARs inside the same physical page.
>
> The spec recommends that the minimum size of a BAR should be 4KB, but
> that's not a strict requirement in which case a BAR can be as small as
> 16bytes, and then you can have multiple ones inside the same page.
Ok, will account on that
>
>> must be page aligned.
>>
>> The end address is expressed in (size - 1) form, again page aligned,
>>
>> so to get the last page to be mapped PFN_DOWN also seems to be appropriate.
>>
>> Do I miss something here?
> I'm not aware of any  of those addresses or sizes being guaranteed to
> be page aligned, so I think you need to account for that.
>
> Some of the code here uses PFN_DOWN to calculate the end address
> because the rangesets are used in an inclusive fashion, so the end
> frame also gets mapped.
Ok
>
>>>> +    mfn = _mfn(PFN_DOWN(header->bars[bar_idx].addr));
>>>>        for ( ; ; )
>>>>        {
>>>>            unsigned long size = e - s + 1;
>>>> @@ -52,11 +125,15 @@ static int map_range(unsigned long s, unsigned long 
>>>> e, void *data,
>>>>             * - {un}map_mmio_regions doesn't support preemption.
>>>>             */
>>>>    
>>>> -        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, _mfn(s))
>>>> -                      : unmap_mmio_regions(map->d, _gfn(s), size, 
>>>> _mfn(s));
>>>> +        rc = map->map ? map_mmio_regions(map->d, _gfn(s), size, mfn)
>>>> +                      : unmap_mmio_regions(map->d, _gfn(s), size, mfn);
>>>>            if ( rc == 0 )
>>>>            {
>>>> -            *c += size;
>>>> +            /*
>>>> +             * Range set is not expressed in frame numbers and the size
>>>> +             * is the number of frames, so update accordingly.
>>>> +             */
>>>> +            *c += size << PAGE_SHIFT;
>>>>                break;
>>>>            }
>>>>            if ( rc < 0 )
>>>> @@ -67,8 +144,9 @@ static int map_range(unsigned long s, unsigned long e, 
>>>> void *data,
>>>>                break;
>>>>            }
>>>>            ASSERT(rc < size);
>>>> -        *c += rc;
>>>> +        *c += rc << PAGE_SHIFT;
>>>>            s += rc;
>>>> +        mfn += rc;
>>>>            if ( general_preempt_check() )
>>>>                    return -ERESTART;
>>>>        }
>>>> @@ -84,7 +162,7 @@ static int map_range(unsigned long s, unsigned long e, 
>>>> void *data,
>>>>    static void modify_decoding(const struct pci_dev *pdev, uint16_t cmd,
>>>>                                bool rom_only)
>>>>    {
>>>> -    struct vpci_header *header = &pdev->vpci->header;
>>>> +    struct vpci_header *header = get_hwdom_vpci_header(pdev);
>>>>        bool map = cmd & PCI_COMMAND_MEMORY;
>>>>        unsigned int i;
>>>>    
>>>> @@ -136,6 +214,7 @@ bool vpci_process_pending(struct vcpu *v)
>>>>            struct map_data data = {
>>>>                .d = v->domain,
>>>>                .map = v->vpci.cmd & PCI_COMMAND_MEMORY,
>>>> +            .pdev = v->vpci.pdev,
>>>>            };
>>>>            int rc = rangeset_consume_ranges(v->vpci.mem, map_range, &data);
>>>>    
>>>> @@ -168,7 +247,8 @@ bool vpci_process_pending(struct vcpu *v)
>>>>    static int __init apply_map(struct domain *d, const struct pci_dev 
>>>> *pdev,
>>>>                                struct rangeset *mem, uint16_t cmd)
>>>>    {
>>>> -    struct map_data data = { .d = d, .map = true };
>>>> +    struct map_data data = { .d = d, .map = true,
>>>> +        .pdev = (struct pci_dev *)pdev };
>>> Dropping the const here is not fine. IT either needs to be dropped
>>> from apply_map and further up, or this needs to also be made const.
>> Ok, I'll try to keep it const
>>>>        int rc;
>>>>    
>>>>        while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == 
>>>> -ERESTART )
>>>> @@ -205,7 +285,7 @@ static void defer_map(struct domain *d, struct pci_dev 
>>>> *pdev,
>>>>    
>>>>    static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool 
>>>> rom_only)
>>>>    {
>>>> -    struct vpci_header *header = &pdev->vpci->header;
>>>> +    struct vpci_header *header;
>>>>        struct rangeset *mem = rangeset_new(NULL, NULL, 0);
>>>>        struct pci_dev *tmp, *dev = NULL;
>>>>    #ifdef CONFIG_X86
>>>> @@ -217,6 +297,11 @@ static int modify_bars(const struct pci_dev *pdev, 
>>>> uint16_t cmd, bool rom_only)
>>>>        if ( !mem )
>>>>            return -ENOMEM;
>>>>    
>>>> +    if ( is_hardware_domain(current->domain) )
>>>> +        header = get_hwdom_vpci_header(pdev);
>>>> +    else
>>>> +        header = get_vpci_header(current->domain, pdev);
>>>> +
>>>>        /*
>>>>         * Create a rangeset that represents the current device BARs memory 
>>>> region
>>>>         * and compare it against all the currently active BAR memory 
>>>> regions. If
>>>> @@ -225,12 +310,15 @@ static int modify_bars(const struct pci_dev *pdev, 
>>>> uint16_t cmd, bool rom_only)
>>>>         * First fill the rangeset with all the BARs of this device or with 
>>>> the ROM
>>>>         * BAR only, depending on whether the guest is toggling the memory 
>>>> decode
>>>>         * bit of the command register, or the enable bit of the ROM BAR 
>>>> register.
>>>> +     *
>>>> +     * Use the PCI reserved bits of the BAR to pass BAR's index.
>>>>         */
>>>>        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>>>>        {
>>>>            const struct vpci_bar *bar = &header->bars[i];
>>>> -        unsigned long start = PFN_DOWN(bar->addr);
>>>> -        unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>>>> +        unsigned long start = (bar->addr & PCI_BASE_ADDRESS_MEM_MASK) | i;
>>>> +        unsigned long end = (bar->addr & PCI_BASE_ADDRESS_MEM_MASK) +
>>>> +            bar->size - 1;
>>> Will this work fine on Arm 32bits with LPAE? It's my understanding
>>> that in that case unsigned long is 32bits, but the physical address
>>> space is 44bits, in which case this won't work.
>> Hm, good question
>>> I think you need to keep the usage of frame numbers here.
>> If I re-work the gfn <-> mfn mapping then yes, I can use frame numbers here 
>> and elsewhere
>>>>    
>>>>            if ( !MAPPABLE_BAR(bar) ||
>>>>                 (rom_only ? bar->type != VPCI_BAR_ROM
>>>> @@ -251,9 +339,11 @@ static int modify_bars(const struct pci_dev *pdev, 
>>>> uint16_t cmd, bool rom_only)
>>>>        /* Remove any MSIX regions if present. */
>>>>        for ( i = 0; msix && i < ARRAY_SIZE(msix->tables); i++ )
>>>>        {
>>>> -        unsigned long start = PFN_DOWN(vmsix_table_addr(pdev->vpci, i));
>>>> -        unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
>>>> -                                     vmsix_table_size(pdev->vpci, i) - 1);
>>>> +        unsigned long start = (vmsix_table_addr(pdev->vpci, i) &
>>>> +                               PCI_BASE_ADDRESS_MEM_MASK) | i;
>>>> +        unsigned long end = (vmsix_table_addr(pdev->vpci, i) &
>>>> +                             PCI_BASE_ADDRESS_MEM_MASK ) +
>>>> +                             vmsix_table_size(pdev->vpci, i) - 1;
>>>>    
>>>>            rc = rangeset_remove_range(mem, start, end);
>>>>            if ( rc )
>>>> @@ -273,6 +363,8 @@ static int modify_bars(const struct pci_dev *pdev, 
>>>> uint16_t cmd, bool rom_only)
>>>>         */
>>>>        for_each_pdev ( pdev->domain, tmp )
>>>>        {
>>>> +        struct vpci_header *header;
>>>> +
>>>>            if ( tmp == pdev )
>>>>            {
>>>>                /*
>>>> @@ -289,11 +381,14 @@ static int modify_bars(const struct pci_dev *pdev, 
>>>> uint16_t cmd, bool rom_only)
>>>>                    continue;
>>>>            }
>>>>    
>>>> -        for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>>>> +        header = get_vpci_header(tmp->domain, pdev);
>>>> +
>>>> +        for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
>>>>            {
>>>> -            const struct vpci_bar *bar = &tmp->vpci->header.bars[i];
>>>> -            unsigned long start = PFN_DOWN(bar->addr);
>>>> -            unsigned long end = PFN_DOWN(bar->addr + bar->size - 1);
>>>> +            const struct vpci_bar *bar = &header->bars[i];
>>>> +            unsigned long start = (bar->addr & PCI_BASE_ADDRESS_MEM_MASK) 
>>>> | i;
>>>> +            unsigned long end = (bar->addr & PCI_BASE_ADDRESS_MEM_MASK)
>>>> +                + bar->size - 1;
>>>>    
>>>>                if ( !bar->enabled || !rangeset_overlaps_range(mem, start, 
>>>> end) ||
>>>>                     /*
>>>> @@ -357,7 +452,7 @@ static void cmd_write(const struct pci_dev *pdev, 
>>>> unsigned int reg,
>>>>            pci_conf_write16(pdev->sbdf, reg, cmd);
>>>>    }
>>>>    
>>>> -static void bar_write(const struct pci_dev *pdev, unsigned int reg,
>>>> +static void bar_write_hwdom(const struct pci_dev *pdev, unsigned int reg,
>>>>                          uint32_t val, void *data)
>>>>    {
>>>>        struct vpci_bar *bar = data;
>>>> @@ -377,14 +472,17 @@ static void bar_write(const struct pci_dev *pdev, 
>>>> unsigned int reg,
>>>>        {
>>>>            /* If the value written is the current one avoid printing a 
>>>> warning. */
>>>>            if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) )
>>>> +        {
>>>> +            struct vpci_header *header = get_hwdom_vpci_header(pdev);
>>>> +
>>>>                gprintk(XENLOG_WARNING,
>>>>                        "%04x:%02x:%02x.%u: ignored BAR %lu write with 
>>>> memory decoding enabled\n",
>>>>                        pdev->seg, pdev->bus, slot, func,
>>>> -                    bar - pdev->vpci->header.bars + hi);
>>>> +                    bar - header->bars + hi);
>>>> +        }
>>>>            return;
>>>>        }
>>>>    
>>>> -
>>>>        /*
>>>>         * Update the cached address, so that when memory decoding is 
>>>> enabled
>>>>         * Xen can map the BAR into the guest p2m.
>>>> @@ -403,10 +501,89 @@ static void bar_write(const struct pci_dev *pdev, 
>>>> unsigned int reg,
>>>>        pci_conf_write32(pdev->sbdf, reg, val);
>>>>    }
>>>>    
>>>> +static uint32_t bar_read_hwdom(const struct pci_dev *pdev, unsigned int 
>>>> reg,
>>>> +                               void *data)
>>>> +{
>>>> +    return vpci_hw_read32(pdev, reg, data);
>>>> +}
>>>> +
>>>> +static void bar_write_guest(const struct pci_dev *pdev, unsigned int reg,
>>>> +                            uint32_t val, void *data)
>>>> +{
>>>> +    struct vpci_bar *vbar = data;
>>>> +    bool hi = false;
>>>> +
>>>> +    if ( vbar->type == VPCI_BAR_MEM64_HI )
>>>> +    {
>>>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>>>> +        vbar--;
>>>> +        hi = true;
>>>> +    }
>>>> +    vbar->addr &= ~(0xffffffffull << (hi ? 32 : 0));
>>>> +    vbar->addr |= (uint64_t)val << (hi ? 32 : 0);
>>>> +}
>>>> +
>>>> +static uint32_t bar_read_guest(const struct pci_dev *pdev, unsigned int 
>>>> reg,
>>>> +                               void *data)
>>>> +{
>>>> +    struct vpci_bar *vbar = data;
>>>> +    uint32_t val;
>>>> +    bool hi = false;
>>>> +
>>>> +    if ( vbar->type == VPCI_BAR_MEM64_HI )
>>>> +    {
>>>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>>>> +        vbar--;
>>>> +        hi = true;
>>>> +    }
>>>> +
>>>> +    if ( vbar->type == VPCI_BAR_MEM64_LO || vbar->type == 
>>>> VPCI_BAR_MEM64_HI )
>>> I think this would be clearer using a switch statement.
>> I'll think about
>>>> +    {
>>>> +        if ( hi )
>>>> +            val = vbar->addr >> 32;
>>>> +        else
>>>> +            val = vbar->addr & 0xffffffff;
>>>> +        if ( val == ~0 )
>>> Strictly speaking I think you are not forced to write 1s to the
>>> reserved 4 bits in the low register (and in the 32bit case).
>> Ah, so Linux kernel, for instance, could have written 0xffffff0 while
>>
>> I expect 0xffffffff?
> I think real hardware would return the size when written 1s to all
> bits except the reserved ones.
>
>>>> +        {
>>>> +            /* Guests detects BAR's properties and sizes. */
>>>> +            if ( !hi )
>>>> +            {
>>>> +                val = 0xffffffff & ~(vbar->size - 1);
>>>> +                val |= vbar->type == VPCI_BAR_MEM32 ? 
>>>> PCI_BASE_ADDRESS_MEM_TYPE_32
>>>> +                                                    : 
>>>> PCI_BASE_ADDRESS_MEM_TYPE_64;
>>>> +                val |= vbar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH 
>>>> : 0;
>>>> +            }
>>>> +            else
>>>> +                val = vbar->size >> 32;
>>>> +            vbar->addr &= ~(0xffffffffull << (hi ? 32 : 0));
>>>> +            vbar->addr |= (uint64_t)val << (hi ? 32 : 0);
>>>> +        }
>>>> +    }
>>>> +    else if ( vbar->type == VPCI_BAR_MEM32 )
>>>> +    {
>>>> +        val = vbar->addr;
>>>> +        if ( val == ~0 )
>>>> +        {
>>>> +            if ( !hi )
>>> There's no way hi can be true at this point AFAICT.
>> Sure, thank you
>>>> +            {
>>>> +                val = 0xffffffff & ~(vbar->size - 1);
>>>> +                val |= vbar->type == VPCI_BAR_MEM32 ? 
>>>> PCI_BASE_ADDRESS_MEM_TYPE_32
>>>> +                                                    : 
>>>> PCI_BASE_ADDRESS_MEM_TYPE_64;
>>>> +                val |= vbar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH 
>>>> : 0;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +    else
>>>> +    {
>>>> +        val = vbar->addr;
>>>> +    }
>>>> +    return val;
>>>> +}
>>>> +
>>>>    static void rom_write(const struct pci_dev *pdev, unsigned int reg,
>>>>                          uint32_t val, void *data)
>>>>    {
>>>> -    struct vpci_header *header = &pdev->vpci->header;
>>>> +    struct vpci_header *header = get_hwdom_vpci_header(pdev);
>>>>        struct vpci_bar *rom = data;
>>>>        uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
>>>>        uint16_t cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
>>>> @@ -452,15 +629,56 @@ static void rom_write(const struct pci_dev *pdev, 
>>>> unsigned int reg,
>>>>            rom->addr = val & PCI_ROM_ADDRESS_MASK;
>>>>    }
>>> Don't you need to also protect a domU from writing to the ROM BAR
>>> register?
>> ROM was not a target of this RFC as I have no HW to test that, but final 
>> code must
>>
>> also handle ROM as well, you are right
>>
>>>>    
>>>> +static uint32_t bar_read_dispatch(const struct pci_dev *pdev, unsigned 
>>>> int reg,
>>>> +                                  void *data)
>>>> +{
>>>> +    struct vpci_bar *vbar, *bar = data;
>>>> +
>>>> +    if ( is_hardware_domain(current->domain) )
>>>> +        return bar_read_hwdom(pdev, reg, data);
>>>> +
>>>> +    vbar = get_vpci_bar(current->domain, pdev, bar->index);
>>>> +    if ( !vbar )
>>>> +        return ~0;
>>>> +
>>>> +    return bar_read_guest(pdev, reg, vbar);
>>>> +}
>>>> +
>>>> +static void bar_write_dispatch(const struct pci_dev *pdev, unsigned int 
>>>> reg,
>>>> +                               uint32_t val, void *data)
>>>> +{
>>>> +    struct vpci_bar *bar = data;
>>>> +
>>>> +    if ( is_hardware_domain(current->domain) )
>>>> +        bar_write_hwdom(pdev, reg, val, data);
>>>> +    else
>>>> +    {
>>>> +        struct vpci_bar *vbar = get_vpci_bar(current->domain, pdev, 
>>>> bar->index);
>>>> +
>>>> +        if ( !vbar )
>>>> +            return;
>>>> +        bar_write_guest(pdev, reg, val, vbar);
>>>> +    }
>>>> +}
>>> You should assign different handlers based on whether the domain that
>>> has the device assigned is a domU or the hardware domain, rather than
>>> doing the selection here.
>> Hm, handlers are assigned once in init_bars and this function is only called
>>
>> for hwdom, so there is no way I can do that for the guests. Hence, the 
>> dispatcher
> I think we might want to reset the vPCI handlers when a devices gets
> assigned and deassigned.

In ARM case init_bars is called too early: PCI device assignment is currently

initiated by Domain-0' kernel and is done *before* PCI devices are given memory

ranges and BARs assigned:

[    0.429514] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.429532] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff]
[    0.429555] pci_bus 0000:00: root bus resource [mem 0xfe200000-0xfe3fffff]
[    0.429575] pci_bus 0000:00: root bus resource [mem 0x30000000-0x37ffffff]
[    0.429604] pci_bus 0000:00: root bus resource [mem 0x38000000-0x3fffffff 
pref]
[    0.429670] pci 0000:00:00.0: enabling Extended Tags
[    0.453764] pci 0000:00:00.0: -------------------- BUS_NOTIFY_ADD_DEVICE

< init_bars >

[    0.453793] pci 0000:00:00.0: -- IRQ 0
[    0.458825] pci 0000:00:00.0: Failed to add - passthrough or MSI/MSI-X might 
fail!
[    0.471790] pci 0000:01:00.0: -------------------- BUS_NOTIFY_ADD_DEVICE

< init_bars >

[    0.471821] pci 0000:01:00.0: -- IRQ 255
[    0.476809] pci 0000:01:00.0: Failed to add - passthrough or MSI/MSI-X might 
fail!

< BAR assignments below >

[    0.488233] pci 0000:00:00.0: BAR 14: assigned [mem 0xfe200000-0xfe2fffff]
[    0.488265] pci 0000:00:00.0: BAR 15: assigned [mem 0x38000000-0x380fffff 
pref]

In case of x86 this is pretty much ok as BARs are already in place, but for ARM 
we

need to take care and re-setup vPCI BARs for hwdom. Things are getting even more

complicated if the host PCI bridge is not ECAM like, so you cannot set 
mmio_handlers

and trap hwdom's access to the config space to update BARs etc. This is why I 
have that

ugly hack for rcar_gen3 to read actual BARs for hwdom.


If we go further and take a look at SR-IOV then when the kernel assigns the 
device

(BUS_NOTIFY_ADD_DEVICE) then it already has BARs assigned for virtual functions

(need to double-check that).

>   In order to do passthrough to domUs safely
> we will have to add more handlers than what's required for dom0,
Can you please tell what are thinking about? What are the missing handlers?
>   and
> having is_hardware_domain sprinkled in all of them is not a suitable
> solution.

I'll try to replace is_hardware_domain with something like:

+/*
+ * Detect whether physical PCI devices in this segment belong
+ * to the domain given, e.g. on x86 all PCI devices live in hwdom,
+ * but in case of ARM this might not be the case: those may also
+ * live in driver domains or even Xen itself.
+ */
+bool pci_is_hardware_domain(struct domain *d, u16 seg)
+{
+#ifdef CONFIG_X86
+    return is_hardware_domain(d);
+#elif CONFIG_ARM
+    return pci_is_owner_domain(d, seg);
+#else
+#error "Unsupported architecture"
+#endif
+}
+
+/*
+ * Get domain which owns this segment: for x86 this is always hardware
+ * domain and for ARM this can be different.
+ */
+struct domain *pci_get_hardware_domain(u16 seg)
+{
+#ifdef CONFIG_X86
+    return hardware_domain;
+#elif CONFIG_ARM
+    return pci_get_owner_domain(seg);
+#else
+#error "Unsupported architecture"
+#endif
+}

This is what I use to properly detect the domain that really owns physical host 
bridge

>
> Roger.

Thank you,

Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.