[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 5/9] vpci/header: Implement guest BAR register handlers


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Oleksandr Andrushchenko <andr2000@xxxxxxxxx>
  • From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>
  • Date: Tue, 7 Sep 2021 17:39:53 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=Q5ycq2nXR5FZX0BWbi7FjueKCXrfpTy2STsvy440gaU=; b=m3PQuJB9rfpJ7tobfCoYCGotDjtY0ZSRCuIPwWb0H3YUZlpqPW7PQHb1UrEsz38j/tFd5SGHKEmioCbAEU2gIy/JeCQBwupJ9JPxo7XknQXuNZoqS7fKruOtFnreDaEiLS16Ifte/GYOjudqFcfxWEn2wl8DMZsgOliS3O5uC1T5Es1vLC81IejbkBG8AXDRMeAzBjEwfVOET3aMMI4GmC6r5AYlNsf8T783PVSWr8Y15HCVQSXmVtohnDUQ2Lr53xTAmagjEOMo1cu9peX3/1ZGifMBqRfAo/urD5v+I77lN/M2kHAnaf8irF78vYsUXjnOpQST4JC+xN2hLEErWw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=L/MK8hwg9nGOd8gb5Ws2edbdyv778uQwgZPVMD8b+MKRAhCje0TQO+6XghwEA/CMt4CL7T8OP25/0yBU1JJK9LwJbX42boTbze8fAs1LBuuIC9ve+VPx+fBlLcFEtKiuuE3Wq55kgOD/Fcsdf9l9nAAsCiin/fXvmWjutQzWdtmWdJLZZygWeE2OMAIK0rV50LNDhLv6lgQz4M19/OeAJr3QXoNCpx/xGtCJsY5779MElfqOAamLM1p0YpcSA3TwXy+0xRBPNJt1pVYKvqThanG+4wHPdSuCGvjlNs1U1v4fxQChYwZt9k1foSd6jjt6Y8IJ1y8NU2FijK/zDJ9yLg==
  • Authentication-results: suse.com; dkim=none (message not signed) header.d=none;suse.com; dmarc=none action=none header.from=epam.com;
  • Cc: "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "roger.pau@xxxxxxxxxx" <roger.pau@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 07 Sep 2021 17:40:04 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHXoKxmxnIdZosa7ESbvTmPlG6KHauXFn+AgAGCLICAADFtgIAAE0qA
  • Thread-topic: [PATCH 5/9] vpci/header: Implement guest BAR register handlers

On 07.09.21 19:30, Jan Beulich wrote:
> On 07.09.2021 15:33, Oleksandr Andrushchenko wrote:
>> On 06.09.21 17:31, Jan Beulich wrote:
>>> On 03.09.2021 12:08, Oleksandr Andrushchenko wrote:
>>>> --- a/xen/drivers/vpci/header.c
>>>> +++ b/xen/drivers/vpci/header.c
>>>> @@ -400,12 +400,72 @@ static void bar_write(const struct pci_dev *pdev, 
>>>> unsigned int reg,
>>>>    static void guest_bar_write(const struct pci_dev *pdev, unsigned int 
>>>> reg,
>>>>                                uint32_t val, void *data)
>>>>    {
>>>> +    struct vpci_bar *bar = data;
>>>> +    bool hi = false;
>>>> +
>>>> +    if ( bar->type == VPCI_BAR_MEM64_HI )
>>>> +    {
>>>> +        ASSERT(reg > PCI_BASE_ADDRESS_0);
>>>> +        bar--;
>>>> +        hi = true;
>>>> +    }
>>>> +    else
>>>> +        val &= PCI_BASE_ADDRESS_MEM_MASK;
>>>> +    bar->guest_addr &= ~(0xffffffffull << (hi ? 32 : 0));
>>>> +    bar->guest_addr |= (uint64_t)val << (hi ? 32 : 0);
>>> What you store here is not the address that's going to be used,
>> bar->guest_addr is never used directly to be reported to a guest.
> And this is what I question, as an approach. Your model _may_ work,
> but its needlessly deviating from how people like me would expect
> this to work. And if it's intended to be this way, how would I
> have known?
Well, I just tried to follow the model we already have in the existing code...
>
>> It is always used as an initial value which is then modified to reflect
>> lower bits, e.g. BAR type and if prefetchable, so I think this is perfectly
>> fine to have it this way.
> And it is also perfectly fine to store the value to be handed
> back to guests on the next read. Keeps the read path simple,
> which I think can be assumed to be taken more frequently than
> the write one. Plus stored values reflect reality.
>
> Plus - if what you say was really the case, why do you mask off
> PCI_BASE_ADDRESS_MEM_MASK here? You should then simply store
> the written value and do _all_ the processing in the read path.
> No point having partial logic in two places.

I will move the logic to the write handler, indeed we can save some

CPU cycles here

>
>>>    as
>>> you don't mask off the low bits (to account for the BAR's size).
>>> When a BAR gets written with all ones, all writable bits get these
>>> ones stored. The address of the BAR, aiui, really changes to
>>> (typically) close below 4Gb (in the case of a 32-bit BAR), which
>>> is why memory / I/O decoding should be off while sizing BARs.
>>> Therefore you shouldn't look for the specific "all writable bits
>>> are ones" pattern (or worse, as you presently do, the "all bits
>>> outside of the type specifier are ones" one) on the read path.
>>> Instead mask the value appropriately here, and simply return back
>>> the stored value from the read path.
>> "PCI LOCAL BUS SPECIFICATION, REV. 3.0", "IMPLEMENTATION NOTE
>>
>> Sizing a 32-bit Base Address Register Example" says, that
>>
>> "Software saves the original value of the Base Address register, writes
>> 0 FFFF FFFFh to the register, then reads it back."
>>
>> The same applies for 64-bit BARs. So what's wrong if I try to catch such
>> a write when a guest tries to size the BAR? The only difference is that
>> I compare as
>>           if ( (val & PCI_BASE_ADDRESS_MEM_MASK_32) == 
>> PCI_BASE_ADDRESS_MEM_MASK_32 )
>> which is because val in the question has lower bits cleared.
> Because while this matches what the spec says, it's not enough to
> match how hardware behaves.
But we should implement as the spec says, not like buggy hardware behaves
>   Yet you want to mimic hardware behavior
> as closely as possible here. There is (iirc) at least one other
> place in the source tree were we had to adjust a similar initial
> implementation to be closer to one expected by guests,

Could you please point me to that code so I can consult and possibly

re-use the approach?

>   no matter
> that they may not be following the spec to the letter. Don't
> forget that there may be bugs in kernels which don't surface until
> the kernel gets run on an overly simplistic emulation.
This is sad. And the kernel needs to be fixed then, not Xen
>
>>>> @@ -522,6 +582,13 @@ static int add_bar_handlers(struct pci_dev *pdev, 
>>>> bool is_hwdom)
>>>>                if ( rc )
>>>>                    return rc;
>>>>            }
>>>> +        /*
>>>> +         * It is neither safe nor secure to initialize guest's view of 
>>>> the BARs
>>>> +         * with real values which are used by the hardware domain, so 
>>>> assign
>>>> +         * all zeros to guest's view of the BARs, so the guest can perform
>>>> +         * proper PCI device enumeration and assign BARs on its own.
>>>> +         */
>>>> +        bars[i].guest_addr = 0;
>>> I'm afraid I don't understand the comment: Without memory decoding
>>> enabled, the BARs are simple registers (with a few r/o bits).
>> My first implementation was that bar->guest_addr was initialized with
>> the value of bar->addr (physical BAR value), but talking on IRC with
>> Roger he suggested that this might be a security issue to let guest
>> a hint about physical values, so then I changed the assignment to be 0.
> Well, yes, that's certainly correct. It's perhaps too unobvious to me
> why one may want to use the host value here in the first place. It
> simply has no meaning here.
Do you want me to remove the comment?
>
>>>> --- a/xen/include/xen/pci_regs.h
>>>> +++ b/xen/include/xen/pci_regs.h
>>>> @@ -103,6 +103,7 @@
>>>>    #define  PCI_BASE_ADDRESS_MEM_TYPE_64   0x04    /* 64 bit address */
>>>>    #define  PCI_BASE_ADDRESS_MEM_PREFETCH  0x08    /* prefetchable? */
>>>>    #define  PCI_BASE_ADDRESS_MEM_MASK      (~0x0fUL)
>>>> +#define  PCI_BASE_ADDRESS_MEM_MASK_32     (~0x0fU)
>>> Please don't introduce an identical constant that's merely of
>>> different type. (uint32_t)PCI_BASE_ADDRESS_MEM_MASK at the use
>>> site (if actually still needed as per the comment above) would
>>> seem more clear to me.
>> Ok, I thought type casting is a bigger evil here
> Often it is, but imo here it is not. I hope you realize that ~0x0fU
> if not necessarily 0xfffffff0? We make certain assumptions on type
> widths. For unsigned int we assume it to be at least 32 bits wide.
> We should avoid assumptions of it being exactly 32 bits wide. Just
> like we cannot (more obviously) assume the width of unsigned long.
> (Which tells us that for 32-bit arches PCI_BASE_ADDRESS_MEM_MASK is
> actually of the wrong type. This constant should be the same no
> matter the bitness.)
Ok, I will not introduce a new define and use (uint32_t)
>
> Jan
>
Thank you,

Oleksandr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.