[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 1/3] xen/vpci: Move ecam access functions to common code


  • To: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 15 Oct 2021 08:29:41 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=NheZX20MwZBm0N64iNBm3OmW9KxSWXOPiPSjl6uZdME=; b=W41U220tGazdZYqKMmgpQ5Yq+qa4/y9kRXoKYeJ0TxpGNHdCgxyR878zL6FyU0N3eY7wp69LPZX6cxvRpLUSLYNKhfwmHCciUPBn3vOIlVkuYr96yBQzR+A/8UX09YbpdMFzbWz/jdbClsr7v/aHiWhoVfaJWeOGHjIqyDkiPh9aM/5tUf5KwFyFkp1Xa6b3cX/tTEI2JM9/T7QwquD9cPF1wX8kZuuHvUCzdFhjy6THEJfGs2YiceIU0B/qK7nxA9B41X7dC0pzE+iQW5ZEDqKZeK1NrLkzj0YTjSAlODYyFYmKN88FNJFioXnfYPxgKnz0Dgnf1gOTIejaGidWdA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fdOOZIQXu13Fa8k72R+FA2bmBs6mI1AuJT7z/RFtesgsQgO4IfCBobDUGfw12aSzuTatyCh6UevXCASp0lgZSq6xNfIseI2hL3F3H9wIc1+Kohqlw0ozw+M7Qk6kl09986Eb1A9Xzje9dRktWJdBzhr6ZmN2fyMW9kXcis1k+6xBc4B6CW8zhf48LSZe+yErRCHUDEeAeYYF5s/W9BNiV2klETxMcRlCCKO1YMF/SD5NYldJ6qfn93s8GY0IgkkoscPy1VtJpMsZVZMtlmtPzWLB5kE6zLJjJRlwfxmtmtwpe2NkZNGgw3Swj1B9ahIA+kcQ7OA0AbELIuJjewN9Ig==
  • Authentication-results: lists.xenproject.org; dkim=none (message not signed) header.d=none;lists.xenproject.org; dmarc=none action=none header.from=suse.com;
  • Cc: Ian Jackson <iwj@xxxxxxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Fri, 15 Oct 2021 06:29:59 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.10.2021 19:09, Bertrand Marquis wrote:
>> On 14 Oct 2021, at 17:06, Jan Beulich <jbeulich@xxxxxxxx> wrote:
>> On 14.10.2021 16:49, Bertrand Marquis wrote:
>>> @@ -305,7 +291,7 @@ static int vpci_portio_read(const struct hvm_io_handler 
>>> *handler,
>>>
>>>     reg = hvm_pci_decode_addr(cf8, addr, &sbdf);
>>>
>>> -    if ( !vpci_access_allowed(reg, size) )
>>> +    if ( !vpci_ecam_access_allowed(reg, size) )
>>>         return X86EMUL_OKAY;
>>>
>>>     *data = vpci_read(sbdf, reg, size);
>>> @@ -335,7 +321,7 @@ static int vpci_portio_write(const struct 
>>> hvm_io_handler *handler,
>>>
>>>     reg = hvm_pci_decode_addr(cf8, addr, &sbdf);
>>>
>>> -    if ( !vpci_access_allowed(reg, size) )
>>> +    if ( !vpci_ecam_access_allowed(reg, size) )
>>>         return X86EMUL_OKAY;
>>>
>>>     vpci_write(sbdf, reg, size, data);
>>
>> Why would port I/O functions call an ECAM helper? And in how far is
>> that helper actually ECAM-specific?
> 
> The function was global before.

I'm not objecting to the function being global, but to the "ecam" in
its name.

>>> @@ -434,25 +420,8 @@ static int vpci_mmcfg_read(struct vcpu *v, unsigned 
>>> long addr,
>>>     reg = vpci_mmcfg_decode_addr(mmcfg, addr, &sbdf);
>>>     read_unlock(&d->arch.hvm.mmcfg_lock);
>>>
>>> -    if ( !vpci_access_allowed(reg, len) ||
>>> -         (reg + len) > PCI_CFG_SPACE_EXP_SIZE )
>>> -        return X86EMUL_OKAY;
>>
>> While I assume this earlier behavior is the reason for ...
> 
> Yes :-)
> 
>>
>>> -    /*
>>> -     * According to the PCIe 3.1A specification:
>>> -     *  - Configuration Reads and Writes must usually be DWORD or smaller
>>> -     *    in size.
>>> -     *  - Because Root Complex implementations are not required to support
>>> -     *    accesses to a RCRB that cross DW boundaries [...] software
>>> -     *    should take care not to cause the generation of such accesses
>>> -     *    when accessing a RCRB unless the Root Complex will support the
>>> -     *    access.
>>> -     *  Xen however supports 8byte accesses by splitting them into two
>>> -     *  4byte accesses.
>>> -     */
>>> -    *data = vpci_read(sbdf, reg, min(4u, len));
>>> -    if ( len == 8 )
>>> -        *data |= (uint64_t)vpci_read(sbdf, reg + 4, 4) << 32;
>>> +    /* Ignore return code */
>>> +    vpci_ecam_mmio_read(sbdf, reg, len, data);
>>
>> ... the commented-upon ignoring of the return value, I don't think
>> that's a good way to deal with things anymore. Instead I think
>> *data should be written to ~0 upon failure, unless it is intended
>> for vpci_ecam_mmio_read() to take care of that case (in which case
>> I'm not sure I would see why it needs to return an error indicator
>> in the first place).
> 
> I am not sure in the first place why this is actually ignored and just
> returning a -1 value.
> If an access is not right, an exception should be generated to the
> Guest instead.

No. That's also not what happens on bare metal, at least not on x86.
Faults cannot be raised for reasons outside of the CPU; such errors
(if these are errors in the first place) need to be dealt with
differently. Signaling an error on the PCI bus would be possible,
but would leave open how that's actually to be dealt with. Instead
bad reads return all ones, while bad writes simply get dropped.

> When we do that on arm the function is returning an error to the upper
> layer in that case, that’s why I did keep a generic function informing the
> caller.

While you're the Arm expert, with the above in mind I wonder what
the actual action in that case ought to be there. Would you explain
to me how, say, a misaligned 2-byte read that the CPU permits but
the PCI subsystem doesn't like would be dealt with by bare metal?

>>> @@ -476,13 +445,8 @@ static int vpci_mmcfg_write(struct vcpu *v, unsigned 
>>> long addr,
>>>     reg = vpci_mmcfg_decode_addr(mmcfg, addr, &sbdf);
>>>     read_unlock(&d->arch.hvm.mmcfg_lock);
>>>
>>> -    if ( !vpci_access_allowed(reg, len) ||
>>> -         (reg + len) > PCI_CFG_SPACE_EXP_SIZE )
>>> -        return X86EMUL_OKAY;
>>> -
>>> -    vpci_write(sbdf, reg, min(4u, len), data);
>>> -    if ( len == 8 )
>>> -        vpci_write(sbdf, reg + 4, 4, data >> 32);
>>> +    /* Ignore return code */
>>> +    vpci_ecam_mmio_write(sbdf, reg, len, data);
>>
>> Here ignoring is fine imo, but if you feel it is important to
>> comment on this, then I think you need to prefer "why" over "what".
> 
> Agree I would just need some help on the why.
> Now there was no comment before to explain why so I could also
> remove the comment altogether.

The latter would be my preference.

>>> --- a/xen/drivers/vpci/vpci.c
>>> +++ b/xen/drivers/vpci/vpci.c
>>> @@ -478,6 +478,66 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, 
>>> unsigned int size,
>>>     spin_unlock(&pdev->vpci->lock);
>>> }
>>>
>>> +/* Helper function to check an access size and alignment on vpci space. */
>>> +bool vpci_ecam_access_allowed(unsigned int reg, unsigned int len)
>>> +{
>>> +    /*
>>> +     * Check access size.
>>> +     *
>>> +     * On arm32 or for 32bit guests on arm, 64bit accesses should be 
>>> forbidden
>>> +     * but as for those platform ISV register, which gives the access size,
>>> +     * cannot have a value 3, checking this would just harden the code.
>>> +     */
>>> +    if ( len != 1 && len != 2 && len != 4 && len != 8 )
>>> +        return false;
>>
>> I'm not convinced talking about Arm specifically here is
>> warranted, unless there's something there that's clearly
>> different from all other architectures. Otherwise the comment
>> should imo be written in more general terms.
> 
> Other architectures might allow this case. So this is specific to Arm.

If it really is, I consider it wrong to live in common code. If
per-arch tweaking is necessary, and if earlier handling of the
intercepted access doesn't already exclude "bad" cases, then a
per-arch hook would imo be the way to go here. Given the size
of the function I would then wonder why it doesn't remain per-
arch in the first place.

>>> +int vpci_ecam_mmio_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int 
>>> len,
>>> +                         unsigned long data)
>>> +{
>>> +    if ( !vpci_ecam_access_allowed(reg, len) ||
>>> +         (reg + len) > PCI_CFG_SPACE_EXP_SIZE )
>>> +        return 0;
>>> +
>>> +    vpci_write(sbdf, reg, min(4u, len), data);
>>> +    if ( len == 8 )
>>> +        vpci_write(sbdf, reg + 4, 4, data >> 32);
>>> +
>>> +    return 1;
>>> +}
>>> +
>>> +int vpci_ecam_mmio_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int 
>>> len,
>>> +                        unsigned long *data)
>>> +{
>>> +    if ( !vpci_ecam_access_allowed(reg, len) ||
>>> +         (reg + len) > PCI_CFG_SPACE_EXP_SIZE )
>>> +        return 0;
>>> +
>>> +    /*
>>> +     * According to the PCIe 3.1A specification:
>>> +     *  - Configuration Reads and Writes must usually be DWORD or smaller
>>> +     *    in size.
>>> +     *  - Because Root Complex implementations are not required to support
>>> +     *    accesses to a RCRB that cross DW boundaries [...] software
>>> +     *    should take care not to cause the generation of such accesses
>>> +     *    when accessing a RCRB unless the Root Complex will support the
>>> +     *    access.
>>> +     *  Xen however supports 8byte accesses by splitting them into two
>>> +     *  4byte accesses.
>>> +     */
>>> +    *data = vpci_read(sbdf, reg, min(4u, len));
>>> +    if ( len == 8 )
>>> +        *data |= (uint64_t)vpci_read(sbdf, reg + 4, 4) << 32;
>>> +
>>> +    return 1;
>>> +}
>>
>> Why do these two functions return int/0/1 instead of
>> bool/false/true (assuming, as per above, that them returning non-
>> void is warranted at all)?
> 
> This is what the mmio handlers should return to say that an access
> was ok or not so the function stick to this standard.

Sticking to this would be okay if the functions here needed their
address taken, such that they can be installed as hooks for a
more general framework to invoke. The functions, however, only get
called directly. Hence there's no reason to mirror what is in need
of cleaning up elsewhere. I'm sure you're aware there we're in the
(slow going) process of improving which types get used where.
While the functions you refer to may not have undergone such
cleanup yet, we generally expect new code to conform to the new
model.

Jan




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.