[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 06/10] vpci: Make every domain handle its own BARs


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Fri, 13 Nov 2020 12:41:43 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=fB6mo+Kpwwh8xD5892sG6TYYo7RkmqiMi2lLmDuyo/Q=; b=l8jEz/VzaJQlCwn9MQznxaV3dfgPj59HDPNOeUi6THc7zGruM9mA89m8eFlwq699l/x1NvF/SXssSArvQ1a7cDgoZbReBYg5hudp/KoDIkJjX3RKnYH8w4zciI3MBkuU7ZHazwgPZAQcy8BxXO/rCAzlLeOgusdbQjG3AITMejBpIX68WV6uowIy2USMT/NpZA4+LDKClfF20wGvoCh4AGWo8m7MRUt1VTusMjbcfLn0k27jNKsEPTxYFUCDQ7M2q6tY7tpv9HYOpxidUOeJYIw9LRmDmOWC7tco5A6PnwmLWvwFBS8ECq5rtH/VRgTwiTA8kOPXg3Wrk+nrJqEDjA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=eCVl6jheAeum17sSz+fy/lK3Xzw431c0qvbvp9pH1kEO3RIob+mtLhovbFWif9n9ZWLHGXotc8EKzHev9R2e239pYw8WZMvRkBbNohwcn30MONt7KmV6KItcdk5ZgbyGft9VpRd/wx7UDKLDEh5PogNPO8w0VPcIPEQ1djAxE0P1EplASvEW5lGaEbnXufckBH6DehoXXetf+IiuDCWRTn6CAd93C5hOCgKIoSHa96VU1g5ISdlYeTv39Lf2856X8LRTxqTNJkFDhUBGixVMAV+QpVxyTlxlMMgMCfDEmqod7mv6h4N1adfPG6kOi7Fa7VjwZHbpG5M6X2MpQDxoCQ==
  • Authentication-results: suse.com; dkim=none (message not signed) header.d=none;suse.com; dmarc=none action=none header.from=epam.com;
  • Cc: Oleksandr Andrushchenko <andr2000@xxxxxxxxx>, "Rahul.Singh@xxxxxxx" <Rahul.Singh@xxxxxxx>, "Bertrand.Marquis@xxxxxxx" <Bertrand.Marquis@xxxxxxx>, "julien.grall@xxxxxxx" <julien.grall@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "iwj@xxxxxxxxxxxxxx" <iwj@xxxxxxxxxxxxxx>, "wl@xxxxxxx" <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Fri, 13 Nov 2020 12:42:08 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHWtpbzyJg77SbpvkSRWLKjWy/3PanEQmgAgAA8YoCAABlOgIABCD8AgABBOQCAAAXIAIAAAS0AgAADNgCAAAlOAIAAEnYA
  • Thread-topic: [PATCH 06/10] vpci: Make every domain handle its own BARs

On 11/13/20 1:35 PM, Jan Beulich wrote:
> On 13.11.2020 12:02, Oleksandr Andrushchenko wrote:
>> On 11/13/20 12:50 PM, Jan Beulich wrote:
>>> On 13.11.2020 11:46, Oleksandr Andrushchenko wrote:
>>>> On 11/13/20 12:25 PM, Jan Beulich wrote:
>>>>> On 13.11.2020 07:32, Oleksandr Andrushchenko wrote:
>>>>>> On 11/12/20 4:46 PM, Roger Pau Monné wrote:
>>>>>>> On Thu, Nov 12, 2020 at 01:16:10PM +0000, Oleksandr Andrushchenko wrote:
>>>>>>>> On 11/12/20 11:40 AM, Roger Pau Monné wrote:
>>>>>>>>> On Mon, Nov 09, 2020 at 02:50:27PM +0200, Oleksandr Andrushchenko 
>>>>>>>>> wrote:
>>>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>>>>>>>> +static uint32_t bar_read_dispatch(const struct pci_dev *pdev, 
>>>>>>>>>> unsigned int reg,
>>>>>>>>>> +                                  void *data)
>>>>>>>>>> +{
>>>>>>>>>> +    struct vpci_bar *vbar, *bar = data;
>>>>>>>>>> +
>>>>>>>>>> +    if ( is_hardware_domain(current->domain) )
>>>>>>>>>> +        return bar_read_hwdom(pdev, reg, data);
>>>>>>>>>> +
>>>>>>>>>> +    vbar = get_vpci_bar(current->domain, pdev, bar->index);
>>>>>>>>>> +    if ( !vbar )
>>>>>>>>>> +        return ~0;
>>>>>>>>>> +
>>>>>>>>>> +    return bar_read_guest(pdev, reg, vbar);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static void bar_write_dispatch(const struct pci_dev *pdev, unsigned 
>>>>>>>>>> int reg,
>>>>>>>>>> +                               uint32_t val, void *data)
>>>>>>>>>> +{
>>>>>>>>>> +    struct vpci_bar *bar = data;
>>>>>>>>>> +
>>>>>>>>>> +    if ( is_hardware_domain(current->domain) )
>>>>>>>>>> +        bar_write_hwdom(pdev, reg, val, data);
>>>>>>>>>> +    else
>>>>>>>>>> +    {
>>>>>>>>>> +        struct vpci_bar *vbar = get_vpci_bar(current->domain, pdev, 
>>>>>>>>>> bar->index);
>>>>>>>>>> +
>>>>>>>>>> +        if ( !vbar )
>>>>>>>>>> +            return;
>>>>>>>>>> +        bar_write_guest(pdev, reg, val, vbar);
>>>>>>>>>> +    }
>>>>>>>>>> +}
>>>>>>>>> You should assign different handlers based on whether the domain that
>>>>>>>>> has the device assigned is a domU or the hardware domain, rather than
>>>>>>>>> doing the selection here.
>>>>>>>> Hm, handlers are assigned once in init_bars and this function is only 
>>>>>>>> called
>>>>>>>>
>>>>>>>> for hwdom, so there is no way I can do that for the guests. Hence, the 
>>>>>>>> dispatcher
>>>>>>> I think we might want to reset the vPCI handlers when a devices gets
>>>>>>> assigned and deassigned.
>>>>>> In ARM case init_bars is called too early: PCI device assignment is 
>>>>>> currently
>>>>>>
>>>>>> initiated by Domain-0' kernel and is done *before* PCI devices are given 
>>>>>> memory
>>>>>>
>>>>>> ranges and BARs assigned:
>>>>>>
>>>>>> [    0.429514] pci_bus 0000:00: root bus resource [bus 00-ff]
>>>>>> [    0.429532] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff]
>>>>>> [    0.429555] pci_bus 0000:00: root bus resource [mem 
>>>>>> 0xfe200000-0xfe3fffff]
>>>>>> [    0.429575] pci_bus 0000:00: root bus resource [mem 
>>>>>> 0x30000000-0x37ffffff]
>>>>>> [    0.429604] pci_bus 0000:00: root bus resource [mem 
>>>>>> 0x38000000-0x3fffffff pref]
>>>>>> [    0.429670] pci 0000:00:00.0: enabling Extended Tags
>>>>>> [    0.453764] pci 0000:00:00.0: -------------------- 
>>>>>> BUS_NOTIFY_ADD_DEVICE
>>>>>>
>>>>>> < init_bars >
>>>>>>
>>>>>> [    0.453793] pci 0000:00:00.0: -- IRQ 0
>>>>>> [    0.458825] pci 0000:00:00.0: Failed to add - passthrough or 
>>>>>> MSI/MSI-X might fail!
>>>>>> [    0.471790] pci 0000:01:00.0: -------------------- 
>>>>>> BUS_NOTIFY_ADD_DEVICE
>>>>>>
>>>>>> < init_bars >
>>>>>>
>>>>>> [    0.471821] pci 0000:01:00.0: -- IRQ 255
>>>>>> [    0.476809] pci 0000:01:00.0: Failed to add - passthrough or 
>>>>>> MSI/MSI-X might fail!
>>>>>>
>>>>>> < BAR assignments below >
>>>>>>
>>>>>> [    0.488233] pci 0000:00:00.0: BAR 14: assigned [mem 
>>>>>> 0xfe200000-0xfe2fffff]
>>>>>> [    0.488265] pci 0000:00:00.0: BAR 15: assigned [mem 
>>>>>> 0x38000000-0x380fffff pref]
>>>>>>
>>>>>> In case of x86 this is pretty much ok as BARs are already in place, but 
>>>>>> for ARM we
>>>>>>
>>>>>> need to take care and re-setup vPCI BARs for hwdom.
>>>>> Even on x86 there's no guarantee that all devices have their BARs set
>>>>> up by firmware.
>>>> This is true. But there you could have config space trapped in "x86 
>>>> generic way",
>>>>
>>>> please correct me if I'm wrong here
>>>>
>>>>> In a subsequent reply you've suggested to move init_bars from "add" to
>>>>> "assign", but I'm having trouble seeing what this would change: It's
>>>>> not Dom0 controlling assignment (to itself), but Xen assigns the device
>>>>> towards the end of pci_add_device().
>>>> PHYSDEVOP_pci_device_add vs XEN_DOMCTL_assign_device
>>>>
>>>> Currently we initialize BARs during PHYSDEVOP_pci_device_add and
>>>>
>>>> if we do that during XEN_DOMCTL_assign_device things seem to change
>>> But there can't possibly be any XEN_DOMCTL_assign_device involved in
>>> booting of Dom0.
>> Indeed. So, do you have an idea when we should call init_bars suitable
>>
>> for both ARM and x86?
>>
>> Another question is: what happens bad if x86 and ARM won't call init_bars
>>
>> until the moment we really assign a PCI device to the first guest?
> I'd like to answer the latter question first: How would Dom0 use
> the device prior to such an assignment? As an implication to the
> presumed answer here, I guess init_bars could be deferred up until
> the first time Dom0 (or more generally the possessing domain)
> accesses any of them. Similarly, devices used by Xen itself could
> have this done immediately before first use. This may require
> tracking on a per-device basis whether initialization was done.
Ok, I'll try to look into it
>
>>>>>>>      In order to do passthrough to domUs safely
>>>>>>> we will have to add more handlers than what's required for dom0,
>>>>>> Can you please tell what are thinking about? What are the missing 
>>>>>> handlers?
>>>>>>>      and
>>>>>>> having is_hardware_domain sprinkled in all of them is not a suitable
>>>>>>> solution.
>>>>>> I'll try to replace is_hardware_domain with something like:
>>>>>>
>>>>>> +/*
>>>>>> + * Detect whether physical PCI devices in this segment belong
>>>>>> + * to the domain given, e.g. on x86 all PCI devices live in hwdom,
>>>>>> + * but in case of ARM this might not be the case: those may also
>>>>>> + * live in driver domains or even Xen itself.
>>>>>> + */
>>>>>> +bool pci_is_hardware_domain(struct domain *d, u16 seg)
>>>>>> +{
>>>>>> +#ifdef CONFIG_X86
>>>>>> +    return is_hardware_domain(d);
>>>>>> +#elif CONFIG_ARM
>>>>>> +    return pci_is_owner_domain(d, seg);
>>>>>> +#else
>>>>>> +#error "Unsupported architecture"
>>>>>> +#endif
>>>>>> +}
>>>>>> +
>>>>>> +/*
>>>>>> + * Get domain which owns this segment: for x86 this is always hardware
>>>>>> + * domain and for ARM this can be different.
>>>>>> + */
>>>>>> +struct domain *pci_get_hardware_domain(u16 seg)
>>>>>> +{
>>>>>> +#ifdef CONFIG_X86
>>>>>> +    return hardware_domain;
>>>>>> +#elif CONFIG_ARM
>>>>>> +    return pci_get_owner_domain(seg);
>>>>>> +#else
>>>>>> +#error "Unsupported architecture"
>>>>>> +#endif
>>>>>> +}
>>>>>>
>>>>>> This is what I use to properly detect the domain that really owns 
>>>>>> physical host bridge
>>>>> I consider this problematic. We should try to not let Arm's and x86'es
>>>>> PCI implementations diverge too much, i.e. at least the underlying basic
>>>>> model would better be similar. For example, if entire segments can be
>>>>> assigned to a driver domain on Arm, why should the same not be possible
>>>>> on x86?
>>>> Good question, probably in this case x86 == ARM and I can use
>>>>
>>>> pci_is_owner_domain for both architectures instead of using 
>>>> is_hardware_domain for x86
>>>>
>>>>> Furthermore I'm suspicious about segments being the right granularity
>>>>> here. On ia64 multiple host bridges could (and typically would) live
>>>>> on segment 0. Iirc I had once seen output from an x86 system which was
>>>>> apparently laid out similarly. Therefore, just like for MCFG, I think
>>>>> the granularity wants to be bus ranges within a segment.
>>>> Can you please suggest something we can use as a hint for such a detection 
>>>> logic?
>>> The underlying information comes from ACPI tables, iirc. I don't
>>> recall the details, though - sorry.
>> Ok, so seg + bus should be enough for both ARM and Xen then, right?
>>
>> pci_get_hardware_domain(u16 seg, u8 bus)
> Whether an individual bus number can suitable express things I can't
> tell; I did say bus range, but of course if you care about just
> individual devices, then a single bus number will of course do.

I can implement the lookup whether a PCI host bridge owned by a particular

domain with something like:

struct pci_host_bridge *bridge = pci_find_host_bridge(seg, bus);

return bridge->dt_node->used_by == d->domain_id;

Could you please give me a hint how this can be done on x86?

>
> Jan

Thank you,

Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.