[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 06/10] vpci: Make every domain handle its own BARs



On 13.11.2020 12:02, Oleksandr Andrushchenko wrote:
> 
> On 11/13/20 12:50 PM, Jan Beulich wrote:
>> On 13.11.2020 11:46, Oleksandr Andrushchenko wrote:
>>> On 11/13/20 12:25 PM, Jan Beulich wrote:
>>>> On 13.11.2020 07:32, Oleksandr Andrushchenko wrote:
>>>>> On 11/12/20 4:46 PM, Roger Pau Monné wrote:
>>>>>> On Thu, Nov 12, 2020 at 01:16:10PM +0000, Oleksandr Andrushchenko wrote:
>>>>>>> On 11/12/20 11:40 AM, Roger Pau Monné wrote:
>>>>>>>> On Mon, Nov 09, 2020 at 02:50:27PM +0200, Oleksandr Andrushchenko 
>>>>>>>> wrote:
>>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>>>>>>> +static uint32_t bar_read_dispatch(const struct pci_dev *pdev, 
>>>>>>>>> unsigned int reg,
>>>>>>>>> +                                  void *data)
>>>>>>>>> +{
>>>>>>>>> +    struct vpci_bar *vbar, *bar = data;
>>>>>>>>> +
>>>>>>>>> +    if ( is_hardware_domain(current->domain) )
>>>>>>>>> +        return bar_read_hwdom(pdev, reg, data);
>>>>>>>>> +
>>>>>>>>> +    vbar = get_vpci_bar(current->domain, pdev, bar->index);
>>>>>>>>> +    if ( !vbar )
>>>>>>>>> +        return ~0;
>>>>>>>>> +
>>>>>>>>> +    return bar_read_guest(pdev, reg, vbar);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void bar_write_dispatch(const struct pci_dev *pdev, unsigned 
>>>>>>>>> int reg,
>>>>>>>>> +                               uint32_t val, void *data)
>>>>>>>>> +{
>>>>>>>>> +    struct vpci_bar *bar = data;
>>>>>>>>> +
>>>>>>>>> +    if ( is_hardware_domain(current->domain) )
>>>>>>>>> +        bar_write_hwdom(pdev, reg, val, data);
>>>>>>>>> +    else
>>>>>>>>> +    {
>>>>>>>>> +        struct vpci_bar *vbar = get_vpci_bar(current->domain, pdev, 
>>>>>>>>> bar->index);
>>>>>>>>> +
>>>>>>>>> +        if ( !vbar )
>>>>>>>>> +            return;
>>>>>>>>> +        bar_write_guest(pdev, reg, val, vbar);
>>>>>>>>> +    }
>>>>>>>>> +}
>>>>>>>> You should assign different handlers based on whether the domain that
>>>>>>>> has the device assigned is a domU or the hardware domain, rather than
>>>>>>>> doing the selection here.
>>>>>>> Hm, handlers are assigned once in init_bars and this function is only 
>>>>>>> called
>>>>>>>
>>>>>>> for hwdom, so there is no way I can do that for the guests. Hence, the 
>>>>>>> dispatcher
>>>>>> I think we might want to reset the vPCI handlers when a devices gets
>>>>>> assigned and deassigned.
>>>>> In ARM case init_bars is called too early: PCI device assignment is 
>>>>> currently
>>>>>
>>>>> initiated by Domain-0' kernel and is done *before* PCI devices are given 
>>>>> memory
>>>>>
>>>>> ranges and BARs assigned:
>>>>>
>>>>> [    0.429514] pci_bus 0000:00: root bus resource [bus 00-ff]
>>>>> [    0.429532] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff]
>>>>> [    0.429555] pci_bus 0000:00: root bus resource [mem 
>>>>> 0xfe200000-0xfe3fffff]
>>>>> [    0.429575] pci_bus 0000:00: root bus resource [mem 
>>>>> 0x30000000-0x37ffffff]
>>>>> [    0.429604] pci_bus 0000:00: root bus resource [mem 
>>>>> 0x38000000-0x3fffffff pref]
>>>>> [    0.429670] pci 0000:00:00.0: enabling Extended Tags
>>>>> [    0.453764] pci 0000:00:00.0: -------------------- 
>>>>> BUS_NOTIFY_ADD_DEVICE
>>>>>
>>>>> < init_bars >
>>>>>
>>>>> [    0.453793] pci 0000:00:00.0: -- IRQ 0
>>>>> [    0.458825] pci 0000:00:00.0: Failed to add - passthrough or MSI/MSI-X 
>>>>> might fail!
>>>>> [    0.471790] pci 0000:01:00.0: -------------------- 
>>>>> BUS_NOTIFY_ADD_DEVICE
>>>>>
>>>>> < init_bars >
>>>>>
>>>>> [    0.471821] pci 0000:01:00.0: -- IRQ 255
>>>>> [    0.476809] pci 0000:01:00.0: Failed to add - passthrough or MSI/MSI-X 
>>>>> might fail!
>>>>>
>>>>> < BAR assignments below >
>>>>>
>>>>> [    0.488233] pci 0000:00:00.0: BAR 14: assigned [mem 
>>>>> 0xfe200000-0xfe2fffff]
>>>>> [    0.488265] pci 0000:00:00.0: BAR 15: assigned [mem 
>>>>> 0x38000000-0x380fffff pref]
>>>>>
>>>>> In case of x86 this is pretty much ok as BARs are already in place, but 
>>>>> for ARM we
>>>>>
>>>>> need to take care and re-setup vPCI BARs for hwdom.
>>>> Even on x86 there's no guarantee that all devices have their BARs set
>>>> up by firmware.
>>> This is true. But there you could have config space trapped in "x86 generic 
>>> way",
>>>
>>> please correct me if I'm wrong here
>>>
>>>> In a subsequent reply you've suggested to move init_bars from "add" to
>>>> "assign", but I'm having trouble seeing what this would change: It's
>>>> not Dom0 controlling assignment (to itself), but Xen assigns the device
>>>> towards the end of pci_add_device().
>>> PHYSDEVOP_pci_device_add vs XEN_DOMCTL_assign_device
>>>
>>> Currently we initialize BARs during PHYSDEVOP_pci_device_add and
>>>
>>> if we do that during XEN_DOMCTL_assign_device things seem to change
>> But there can't possibly be any XEN_DOMCTL_assign_device involved in
>> booting of Dom0.
> 
> Indeed. So, do you have an idea when we should call init_bars suitable
> 
> for both ARM and x86?
> 
> Another question is: what happens bad if x86 and ARM won't call init_bars
> 
> until the moment we really assign a PCI device to the first guest?

I'd like to answer the latter question first: How would Dom0 use
the device prior to such an assignment? As an implication to the
presumed answer here, I guess init_bars could be deferred up until
the first time Dom0 (or more generally the possessing domain)
accesses any of them. Similarly, devices used by Xen itself could
have this done immediately before first use. This may require
tracking on a per-device basis whether initialization was done.

>>>>>>     In order to do passthrough to domUs safely
>>>>>> we will have to add more handlers than what's required for dom0,
>>>>> Can you please tell what are thinking about? What are the missing 
>>>>> handlers?
>>>>>>     and
>>>>>> having is_hardware_domain sprinkled in all of them is not a suitable
>>>>>> solution.
>>>>> I'll try to replace is_hardware_domain with something like:
>>>>>
>>>>> +/*
>>>>> + * Detect whether physical PCI devices in this segment belong
>>>>> + * to the domain given, e.g. on x86 all PCI devices live in hwdom,
>>>>> + * but in case of ARM this might not be the case: those may also
>>>>> + * live in driver domains or even Xen itself.
>>>>> + */
>>>>> +bool pci_is_hardware_domain(struct domain *d, u16 seg)
>>>>> +{
>>>>> +#ifdef CONFIG_X86
>>>>> +    return is_hardware_domain(d);
>>>>> +#elif CONFIG_ARM
>>>>> +    return pci_is_owner_domain(d, seg);
>>>>> +#else
>>>>> +#error "Unsupported architecture"
>>>>> +#endif
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * Get domain which owns this segment: for x86 this is always hardware
>>>>> + * domain and for ARM this can be different.
>>>>> + */
>>>>> +struct domain *pci_get_hardware_domain(u16 seg)
>>>>> +{
>>>>> +#ifdef CONFIG_X86
>>>>> +    return hardware_domain;
>>>>> +#elif CONFIG_ARM
>>>>> +    return pci_get_owner_domain(seg);
>>>>> +#else
>>>>> +#error "Unsupported architecture"
>>>>> +#endif
>>>>> +}
>>>>>
>>>>> This is what I use to properly detect the domain that really owns 
>>>>> physical host bridge
>>>> I consider this problematic. We should try to not let Arm's and x86'es
>>>> PCI implementations diverge too much, i.e. at least the underlying basic
>>>> model would better be similar. For example, if entire segments can be
>>>> assigned to a driver domain on Arm, why should the same not be possible
>>>> on x86?
>>> Good question, probably in this case x86 == ARM and I can use
>>>
>>> pci_is_owner_domain for both architectures instead of using 
>>> is_hardware_domain for x86
>>>
>>>> Furthermore I'm suspicious about segments being the right granularity
>>>> here. On ia64 multiple host bridges could (and typically would) live
>>>> on segment 0. Iirc I had once seen output from an x86 system which was
>>>> apparently laid out similarly. Therefore, just like for MCFG, I think
>>>> the granularity wants to be bus ranges within a segment.
>>> Can you please suggest something we can use as a hint for such a detection 
>>> logic?
>> The underlying information comes from ACPI tables, iirc. I don't
>> recall the details, though - sorry.
> 
> Ok, so seg + bus should be enough for both ARM and Xen then, right?
> 
> pci_get_hardware_domain(u16 seg, u8 bus)

Whether an individual bus number can suitable express things I can't
tell; I did say bus range, but of course if you care about just
individual devices, then a single bus number will of course do.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.