[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 06/10] vpci: Make every domain handle its own BARs



On 13.11.2020 07:32, Oleksandr Andrushchenko wrote:
> On 11/12/20 4:46 PM, Roger Pau Monné wrote:
>> On Thu, Nov 12, 2020 at 01:16:10PM +0000, Oleksandr Andrushchenko wrote:
>>> On 11/12/20 11:40 AM, Roger Pau Monné wrote:
>>>> On Mon, Nov 09, 2020 at 02:50:27PM +0200, Oleksandr Andrushchenko wrote:
>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>>> +static uint32_t bar_read_dispatch(const struct pci_dev *pdev, unsigned 
>>>>> int reg,
>>>>> +                                  void *data)
>>>>> +{
>>>>> +    struct vpci_bar *vbar, *bar = data;
>>>>> +
>>>>> +    if ( is_hardware_domain(current->domain) )
>>>>> +        return bar_read_hwdom(pdev, reg, data);
>>>>> +
>>>>> +    vbar = get_vpci_bar(current->domain, pdev, bar->index);
>>>>> +    if ( !vbar )
>>>>> +        return ~0;
>>>>> +
>>>>> +    return bar_read_guest(pdev, reg, vbar);
>>>>> +}
>>>>> +
>>>>> +static void bar_write_dispatch(const struct pci_dev *pdev, unsigned int 
>>>>> reg,
>>>>> +                               uint32_t val, void *data)
>>>>> +{
>>>>> +    struct vpci_bar *bar = data;
>>>>> +
>>>>> +    if ( is_hardware_domain(current->domain) )
>>>>> +        bar_write_hwdom(pdev, reg, val, data);
>>>>> +    else
>>>>> +    {
>>>>> +        struct vpci_bar *vbar = get_vpci_bar(current->domain, pdev, 
>>>>> bar->index);
>>>>> +
>>>>> +        if ( !vbar )
>>>>> +            return;
>>>>> +        bar_write_guest(pdev, reg, val, vbar);
>>>>> +    }
>>>>> +}
>>>> You should assign different handlers based on whether the domain that
>>>> has the device assigned is a domU or the hardware domain, rather than
>>>> doing the selection here.
>>> Hm, handlers are assigned once in init_bars and this function is only called
>>>
>>> for hwdom, so there is no way I can do that for the guests. Hence, the 
>>> dispatcher
>> I think we might want to reset the vPCI handlers when a devices gets
>> assigned and deassigned.
> 
> In ARM case init_bars is called too early: PCI device assignment is currently
> 
> initiated by Domain-0' kernel and is done *before* PCI devices are given 
> memory
> 
> ranges and BARs assigned:
> 
> [    0.429514] pci_bus 0000:00: root bus resource [bus 00-ff]
> [    0.429532] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff]
> [    0.429555] pci_bus 0000:00: root bus resource [mem 0xfe200000-0xfe3fffff]
> [    0.429575] pci_bus 0000:00: root bus resource [mem 0x30000000-0x37ffffff]
> [    0.429604] pci_bus 0000:00: root bus resource [mem 0x38000000-0x3fffffff 
> pref]
> [    0.429670] pci 0000:00:00.0: enabling Extended Tags
> [    0.453764] pci 0000:00:00.0: -------------------- BUS_NOTIFY_ADD_DEVICE
> 
> < init_bars >
> 
> [    0.453793] pci 0000:00:00.0: -- IRQ 0
> [    0.458825] pci 0000:00:00.0: Failed to add - passthrough or MSI/MSI-X 
> might fail!
> [    0.471790] pci 0000:01:00.0: -------------------- BUS_NOTIFY_ADD_DEVICE
> 
> < init_bars >
> 
> [    0.471821] pci 0000:01:00.0: -- IRQ 255
> [    0.476809] pci 0000:01:00.0: Failed to add - passthrough or MSI/MSI-X 
> might fail!
> 
> < BAR assignments below >
> 
> [    0.488233] pci 0000:00:00.0: BAR 14: assigned [mem 0xfe200000-0xfe2fffff]
> [    0.488265] pci 0000:00:00.0: BAR 15: assigned [mem 0x38000000-0x380fffff 
> pref]
> 
> In case of x86 this is pretty much ok as BARs are already in place, but for 
> ARM we
> 
> need to take care and re-setup vPCI BARs for hwdom.

Even on x86 there's no guarantee that all devices have their BARs set
up by firmware.

In a subsequent reply you've suggested to move init_bars from "add" to
"assign", but I'm having trouble seeing what this would change: It's
not Dom0 controlling assignment (to itself), but Xen assigns the device
towards the end of pci_add_device().

> Things are getting even more
> 
> complicated if the host PCI bridge is not ECAM like, so you cannot set 
> mmio_handlers
> 
> and trap hwdom's access to the config space to update BARs etc. This is why I 
> have that
> 
> ugly hack for rcar_gen3 to read actual BARs for hwdom.

How to config space accesses work there? The latest for MSI/MSI-X it'll
be imperative that Xen be able to intercept config space writes.

>>   In order to do passthrough to domUs safely
>> we will have to add more handlers than what's required for dom0,
> Can you please tell what are thinking about? What are the missing handlers?
>>   and
>> having is_hardware_domain sprinkled in all of them is not a suitable
>> solution.
> 
> I'll try to replace is_hardware_domain with something like:
> 
> +/*
> + * Detect whether physical PCI devices in this segment belong
> + * to the domain given, e.g. on x86 all PCI devices live in hwdom,
> + * but in case of ARM this might not be the case: those may also
> + * live in driver domains or even Xen itself.
> + */
> +bool pci_is_hardware_domain(struct domain *d, u16 seg)
> +{
> +#ifdef CONFIG_X86
> +    return is_hardware_domain(d);
> +#elif CONFIG_ARM
> +    return pci_is_owner_domain(d, seg);
> +#else
> +#error "Unsupported architecture"
> +#endif
> +}
> +
> +/*
> + * Get domain which owns this segment: for x86 this is always hardware
> + * domain and for ARM this can be different.
> + */
> +struct domain *pci_get_hardware_domain(u16 seg)
> +{
> +#ifdef CONFIG_X86
> +    return hardware_domain;
> +#elif CONFIG_ARM
> +    return pci_get_owner_domain(seg);
> +#else
> +#error "Unsupported architecture"
> +#endif
> +}
> 
> This is what I use to properly detect the domain that really owns physical 
> host bridge

I consider this problematic. We should try to not let Arm's and x86'es
PCI implementations diverge too much, i.e. at least the underlying basic
model would better be similar. For example, if entire segments can be
assigned to a driver domain on Arm, why should the same not be possible
on x86?

Furthermore I'm suspicious about segments being the right granularity
here. On ia64 multiple host bridges could (and typically would) live
on segment 0. Iirc I had once seen output from an x86 system which was
apparently laid out similarly. Therefore, just like for MCFG, I think
the granularity wants to be bus ranges within a segment.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.