[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 06/10] vpci: Make every domain handle its own BARs



On 13.11.2020 11:46, Oleksandr Andrushchenko wrote:
> On 11/13/20 12:25 PM, Jan Beulich wrote:
>> On 13.11.2020 07:32, Oleksandr Andrushchenko wrote:
>>> On 11/12/20 4:46 PM, Roger Pau Monné wrote:
>>>> On Thu, Nov 12, 2020 at 01:16:10PM +0000, Oleksandr Andrushchenko wrote:
>>>>> On 11/12/20 11:40 AM, Roger Pau Monné wrote:
>>>>>> On Mon, Nov 09, 2020 at 02:50:27PM +0200, Oleksandr Andrushchenko wrote:
>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>>>>> +static uint32_t bar_read_dispatch(const struct pci_dev *pdev, unsigned 
>>>>>>> int reg,
>>>>>>> +                                  void *data)
>>>>>>> +{
>>>>>>> +    struct vpci_bar *vbar, *bar = data;
>>>>>>> +
>>>>>>> +    if ( is_hardware_domain(current->domain) )
>>>>>>> +        return bar_read_hwdom(pdev, reg, data);
>>>>>>> +
>>>>>>> +    vbar = get_vpci_bar(current->domain, pdev, bar->index);
>>>>>>> +    if ( !vbar )
>>>>>>> +        return ~0;
>>>>>>> +
>>>>>>> +    return bar_read_guest(pdev, reg, vbar);
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void bar_write_dispatch(const struct pci_dev *pdev, unsigned 
>>>>>>> int reg,
>>>>>>> +                               uint32_t val, void *data)
>>>>>>> +{
>>>>>>> +    struct vpci_bar *bar = data;
>>>>>>> +
>>>>>>> +    if ( is_hardware_domain(current->domain) )
>>>>>>> +        bar_write_hwdom(pdev, reg, val, data);
>>>>>>> +    else
>>>>>>> +    {
>>>>>>> +        struct vpci_bar *vbar = get_vpci_bar(current->domain, pdev, 
>>>>>>> bar->index);
>>>>>>> +
>>>>>>> +        if ( !vbar )
>>>>>>> +            return;
>>>>>>> +        bar_write_guest(pdev, reg, val, vbar);
>>>>>>> +    }
>>>>>>> +}
>>>>>> You should assign different handlers based on whether the domain that
>>>>>> has the device assigned is a domU or the hardware domain, rather than
>>>>>> doing the selection here.
>>>>> Hm, handlers are assigned once in init_bars and this function is only 
>>>>> called
>>>>>
>>>>> for hwdom, so there is no way I can do that for the guests. Hence, the 
>>>>> dispatcher
>>>> I think we might want to reset the vPCI handlers when a devices gets
>>>> assigned and deassigned.
>>> In ARM case init_bars is called too early: PCI device assignment is 
>>> currently
>>>
>>> initiated by Domain-0' kernel and is done *before* PCI devices are given 
>>> memory
>>>
>>> ranges and BARs assigned:
>>>
>>> [    0.429514] pci_bus 0000:00: root bus resource [bus 00-ff]
>>> [    0.429532] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff]
>>> [    0.429555] pci_bus 0000:00: root bus resource [mem 
>>> 0xfe200000-0xfe3fffff]
>>> [    0.429575] pci_bus 0000:00: root bus resource [mem 
>>> 0x30000000-0x37ffffff]
>>> [    0.429604] pci_bus 0000:00: root bus resource [mem 
>>> 0x38000000-0x3fffffff pref]
>>> [    0.429670] pci 0000:00:00.0: enabling Extended Tags
>>> [    0.453764] pci 0000:00:00.0: -------------------- BUS_NOTIFY_ADD_DEVICE
>>>
>>> < init_bars >
>>>
>>> [    0.453793] pci 0000:00:00.0: -- IRQ 0
>>> [    0.458825] pci 0000:00:00.0: Failed to add - passthrough or MSI/MSI-X 
>>> might fail!
>>> [    0.471790] pci 0000:01:00.0: -------------------- BUS_NOTIFY_ADD_DEVICE
>>>
>>> < init_bars >
>>>
>>> [    0.471821] pci 0000:01:00.0: -- IRQ 255
>>> [    0.476809] pci 0000:01:00.0: Failed to add - passthrough or MSI/MSI-X 
>>> might fail!
>>>
>>> < BAR assignments below >
>>>
>>> [    0.488233] pci 0000:00:00.0: BAR 14: assigned [mem 
>>> 0xfe200000-0xfe2fffff]
>>> [    0.488265] pci 0000:00:00.0: BAR 15: assigned [mem 
>>> 0x38000000-0x380fffff pref]
>>>
>>> In case of x86 this is pretty much ok as BARs are already in place, but for 
>>> ARM we
>>>
>>> need to take care and re-setup vPCI BARs for hwdom.
>> Even on x86 there's no guarantee that all devices have their BARs set
>> up by firmware.
> 
> This is true. But there you could have config space trapped in "x86 generic 
> way",
> 
> please correct me if I'm wrong here
> 
>>
>> In a subsequent reply you've suggested to move init_bars from "add" to
>> "assign", but I'm having trouble seeing what this would change: It's
>> not Dom0 controlling assignment (to itself), but Xen assigns the device
>> towards the end of pci_add_device().
> 
> PHYSDEVOP_pci_device_add vs XEN_DOMCTL_assign_device
> 
> Currently we initialize BARs during PHYSDEVOP_pci_device_add and
> 
> if we do that during XEN_DOMCTL_assign_device things seem to change

But there can't possibly be any XEN_DOMCTL_assign_device involved in
booting of Dom0. 

>>>>    In order to do passthrough to domUs safely
>>>> we will have to add more handlers than what's required for dom0,
>>> Can you please tell what are thinking about? What are the missing handlers?
>>>>    and
>>>> having is_hardware_domain sprinkled in all of them is not a suitable
>>>> solution.
>>> I'll try to replace is_hardware_domain with something like:
>>>
>>> +/*
>>> + * Detect whether physical PCI devices in this segment belong
>>> + * to the domain given, e.g. on x86 all PCI devices live in hwdom,
>>> + * but in case of ARM this might not be the case: those may also
>>> + * live in driver domains or even Xen itself.
>>> + */
>>> +bool pci_is_hardware_domain(struct domain *d, u16 seg)
>>> +{
>>> +#ifdef CONFIG_X86
>>> +    return is_hardware_domain(d);
>>> +#elif CONFIG_ARM
>>> +    return pci_is_owner_domain(d, seg);
>>> +#else
>>> +#error "Unsupported architecture"
>>> +#endif
>>> +}
>>> +
>>> +/*
>>> + * Get domain which owns this segment: for x86 this is always hardware
>>> + * domain and for ARM this can be different.
>>> + */
>>> +struct domain *pci_get_hardware_domain(u16 seg)
>>> +{
>>> +#ifdef CONFIG_X86
>>> +    return hardware_domain;
>>> +#elif CONFIG_ARM
>>> +    return pci_get_owner_domain(seg);
>>> +#else
>>> +#error "Unsupported architecture"
>>> +#endif
>>> +}
>>>
>>> This is what I use to properly detect the domain that really owns physical 
>>> host bridge
>> I consider this problematic. We should try to not let Arm's and x86'es
>> PCI implementations diverge too much, i.e. at least the underlying basic
>> model would better be similar. For example, if entire segments can be
>> assigned to a driver domain on Arm, why should the same not be possible
>> on x86?
> 
> Good question, probably in this case x86 == ARM and I can use
> 
> pci_is_owner_domain for both architectures instead of using 
> is_hardware_domain for x86
> 
>>
>> Furthermore I'm suspicious about segments being the right granularity
>> here. On ia64 multiple host bridges could (and typically would) live
>> on segment 0. Iirc I had once seen output from an x86 system which was
>> apparently laid out similarly. Therefore, just like for MCFG, I think
>> the granularity wants to be bus ranges within a segment.
> Can you please suggest something we can use as a hint for such a detection 
> logic?

The underlying information comes from ACPI tables, iirc. I don't
recall the details, though - sorry.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.