[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vpci: Add resizable bar support


  • To: Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: "Chen, Jiqian" <Jiqian.Chen@xxxxxxx>
  • Date: Wed, 27 Nov 2024 09:07:46 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QQ/9DMJ3XSpMo9JbXqbzPoxcqou/vUYdFyXOUC8mhIA=; b=TO1Rh5e8QOc0T2morrRY5gFkYZnlCvKXkHU/5eSryZcf79ZUcsg9NK0r0uccagRjZcS+A3YFq3tSZYBN98KNYlD6iIQ3mNmJUn/gEbO6SAoooWcpmAWJhq4lOacBaLEUuT7r2jWDzUkZi34MXGKedhFGR9aPN588Nji+ymLDvjiN+v6wmV+WjbgvZyr7RTnIndPauFxOJzyxqRUP9ju0bT7zE+ffGRKIr2ORbnT/1+/cEoSxIZIf3pvtBzNUGLUyTumvFUE112XMn9+sNljhlFwqmPj/tL7HxMN71KGLKkp63fwf26qsOx53i2/qh/8Q9CLlyQ/8nHVpDG7qZ+BmZA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=hDyJe/rEG800QLhoLbE3qVv76J3SuUB+29VtHRhjCKEjt7+W2Ra6Luf52+T6qePKwhZWn1PMgCbgHCaGKyb7WL1jVaQZLDEREapLf1tSXtwTDPkbPtVDn6WK6rjI7qydkzweYOXcI4hSfNMuWw/m0Y6YjwhHQZRk5ruPYzcY/rg1ysdWQNvsLEcRhDfBDv+VMuPv1gOo2D/7JZRZ3xeDe8Oyhs2Pa4C8/EKZVKbkWSgECFwDG8o3yk68QLiDdF6LYY6TX6kT91cal5D5P/eZ045d7ZHUXpHIG5BbGE9kJecobv0j9FCl+bYauXC3F3R0BsMA3sa5j0BJy0l6lj0+9Q==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "Chen, Jiqian" <Jiqian.Chen@xxxxxxx>
  • Delivery-date: Wed, 27 Nov 2024 09:08:03 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHbNaIlMxf8mJcywkeNRFmHEai1qbK08gcAgACKfwD//4ZQAIABzRcAgAAfLACAAB0/gIABIxcAgAAMOoCABNwTAIABfwWAgAFxTQD//+JeAIABss4A///tiAAAzKybgAACnTYAADSCfYD//7wRgIABpe2A
  • Thread-topic: [PATCH] vpci: Add resizable bar support

On 2024/11/26 17:47, Jan Beulich wrote:
> On 26.11.2024 07:02, Chen, Jiqian wrote:
>> On 2024/11/25 20:47, Roger Pau Monné wrote:
>>> On Mon, Nov 25, 2024 at 03:44:52AM +0000, Chen, Jiqian wrote:
>>>> On 2024/11/21 17:52, Roger Pau Monné wrote:
>>>>> On Thu, Nov 21, 2024 at 03:05:14AM +0000, Chen, Jiqian wrote:
>>>>>> On 2024/11/20 17:01, Roger Pau Monné wrote:
>>>>>>> On Wed, Nov 20, 2024 at 03:01:57AM +0000, Chen, Jiqian wrote:
>>>>>>>> The only difference between our methods is the timing of updating the 
>>>>>>>> size.
>>>>>>>> Yours is later than mine because you updated the size when the driver 
>>>>>>>> re-enabled memory decoding, while I updated the size in time when 
>>>>>>>> driver resize it.
>>>>>>>
>>>>>>> Indeed, my last guess is the stale cached size is somehow used in my
>>>>>>> approach, and that leads to the failures.  One last (possibly dummy?)
>>>>>>> thing to try might be to use your patch to detect writes to the resize
>>>>>>> control register, but update the BAR sizes in modify_bars(), while
>>>>>>> keeping the traces of when the operations happen.
>>>>>>>
>>>>>> This can work, combine our method, use my patch to detect and write the 
>>>>>> size into hardware register, and use your patch to update bar[i].size in 
>>>>>> modify_bars().
>>>>>> Attached the combined patch and the xl dmesg.
>>>>>
>>>>> This is even weirder, so the attached patch works fine?  The only
>>>>> difference with my proposal is that you trap the CTRL registers, but
>>>>> the sizing is still done in modify_bars().
>>>>>
>>>>> What happens if (based on the attached patch) you change
>>>>> rebar_ctrl_write() to:
>>>>>
>>>>> static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
>>>>>                                       unsigned int reg,
>>>>>                                       uint32_t val,
>>>>>                                       void *data)
>>>>> {
>>>>>     pci_conf_write32(pdev->sbdf, reg, val);
>>>>> }
>>>>>
>>>> If I change rebar_ctrl_write() to:
>>>> static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
>>>>                                       unsigned int reg,
>>>>                                       uint32_t val,
>>>>                                       void *data)
>>>> {
>>>>     printk("cjq_debug %pp: bar ctrl write reg %u, val %x\n", &pdev->sbdf, 
>>>> reg, val);
>>>>     pci_conf_write32(pdev->sbdf, reg, val);
>>>> }
>>>>
>>>> I can see three time prints, it can't work.
>>>> (XEN) cjq_debug 0000:03:00.0: bar ctrl write reg 520, val d40
>>>> (XEN) cjq_debug 0000:03:00.0: bar ctrl write reg 520, val d40
>>>> (XEN) cjq_debug 0000:03:00.0: bar ctrl write reg 528, val 102
>>>>
>>>> If I change rebar_ctrl_write() to:
>>>> static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
>>>>                                       unsigned int reg,
>>>>                                       uint32_t val,
>>>>                                       void *data)
>>>> {
>>>>     if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
>>>>         return;
>>>>     printk("cjq_debug %pp: bar ctrl write reg %u, val %x\n", &pdev->sbdf, 
>>>> reg, val);
>>>>     pci_conf_write32(pdev->sbdf, reg, val);
>>>> } 
>>>>
>>>> I can only see one time print:
>>>> (XEN) cjq_debug 0000:03:00.0: bar ctrl write reg 520, val d40
>>>>
>>>> The check prevented the two times incorrect write actions.
>>>>     if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
>>>>         return;
>>>>
>>>> And why my original patch can work too, the check:
>>>> +    ctrl = pci_conf_read32(pdev->sbdf, reg);
>>>> +    if ( ctrl == val )
>>>> +        return;
>>>> happened to play the same role as PCI_COMMAND_MEMORY check.
>>>
>>> Thank you very much for figuring this out.  So in the end it's a bug
>>> in the driver that plays with PCI_REBAR_CTRL with memory decoding
>>> enabled.
>> Yes, I think.
>> During driver initiation, it calls pci_rebar_set_size to resize BARs,
>> after that, it calls pci_restore_state->pci_restore_rebar_state to restore 
>> BARs,
>> the problem is when calling pci_restore_rebar_state, memory deoding is 
>> enabled state.
>> I will discuss with my colleagues internally whether this needs to be 
>> modified in amdgpu driver.
> 
> Why would memory decoding be enabled at that time? pci_restore_config_space()
> specifically takes care of restoring CMD only after restoring BARs. And
> pci_restore_config_space() is invoked by pci_restore_state() quite a bit
> later than pci_restore_rebar_state(). So the driver must (wrongly?) be
> enabling decoding earlier on?
I got some information from my colleague, driver save and restore the device's 
state immediately
without disable decoding since the state are the same to fix a bug of driver.
So, it is driver's problem, not Xen or Roger's method. But as Roger said, it is 
better to trap Rebar_ctrl to prevent similar problems.

> 
> Jan

-- 
Best regards,
Jiqian Chen.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.