[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vpci: Add resizable bar support


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: "Chen, Jiqian" <Jiqian.Chen@xxxxxxx>
  • Date: Tue, 26 Nov 2024 06:02:14 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=E9k6LndSu4habBBfWJ6W15Wke6T4UEitBbqEMtaU284=; b=cBgtFjQejrZMWXkc7VyedohVEPaGOXsIgt6W0V0oF+vprac15cF+bXGceGFuHV67KHwnZHYRoBJ63uNXdyLeEzrnzyOG5X3iW3tFoU2ISMudN+IYz7UMrUbkvpxhKbW1Zymknnpwi5I1qgzWa4i6G9xJTRfYbHKHrijW8YwKt1bIdBDw/gQeAmmB7M4sKSXeraGkf4h/GIbnJ0TjMQKQlVLxvB58V0/GetYw2X0GzNMOwhoHuhNU9GHAASyUlOSBNOcf2vqbQCIfeG6QPqIpLgwDuD1QEGxx/Og9t7eItOpX5vj49xYi7AdD4nRBc7QXIKvz8n+wp5zLMKjCGDeYtg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=us4jKhQhoQTsh/fzNumeyx/XsUsVrh5uYPhu156IymuDEfAoAjw5EX0vvRTDvGhk05p5ICTqRmQNlr1hH/wk2EZ9it2DIhnxfX/YDBWi9Ixt+kaNJ1+JJh5DNKM8mmCIbd9QPzSLOb2O/C5p41VGqbJMqMKX2UoYB3OgRhzSaoGASx02DoVH1AyoBVvCRJKqiih56kHxQwQhfDPrKkqNdRFmNpO3xHQtLEZwHyc1cXdhdk7IKHXYPxSeKxc2WERx6uw9x4cGmYVmvUFcK16N3EUFlmpVkPPUgqLS4UyBJuf0btWeckZm1Qyp81VUvk60PzcrI1jcJJkZCUea8bkrbQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "Chen, Jiqian" <Jiqian.Chen@xxxxxxx>
  • Delivery-date: Tue, 26 Nov 2024 06:03:03 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHbNaIlMxf8mJcywkeNRFmHEai1qbK08gcAgACKfwD//4ZQAIABzRcAgAAfLACAAB0/gIABIxcAgAAMOoCABNwTAIABfwWAgAFxTQD//+JeAIABss4A///tiAAAzKybgAACnTYAADSCfYA=
  • Thread-topic: [PATCH] vpci: Add resizable bar support

On 2024/11/25 20:47, Roger Pau Monné wrote:
> On Mon, Nov 25, 2024 at 03:44:52AM +0000, Chen, Jiqian wrote:
>> On 2024/11/21 17:52, Roger Pau Monné wrote:
>>> On Thu, Nov 21, 2024 at 03:05:14AM +0000, Chen, Jiqian wrote:
>>>> On 2024/11/20 17:01, Roger Pau Monné wrote:
>>>>> On Wed, Nov 20, 2024 at 03:01:57AM +0000, Chen, Jiqian wrote:
>>>>>> The only difference between our methods is the timing of updating the 
>>>>>> size.
>>>>>> Yours is later than mine because you updated the size when the driver 
>>>>>> re-enabled memory decoding, while I updated the size in time when driver 
>>>>>> resize it.
>>>>>
>>>>> Indeed, my last guess is the stale cached size is somehow used in my
>>>>> approach, and that leads to the failures.  One last (possibly dummy?)
>>>>> thing to try might be to use your patch to detect writes to the resize
>>>>> control register, but update the BAR sizes in modify_bars(), while
>>>>> keeping the traces of when the operations happen.
>>>>>
>>>> This can work, combine our method, use my patch to detect and write the 
>>>> size into hardware register, and use your patch to update bar[i].size in 
>>>> modify_bars().
>>>> Attached the combined patch and the xl dmesg.
>>>
>>> This is even weirder, so the attached patch works fine?  The only
>>> difference with my proposal is that you trap the CTRL registers, but
>>> the sizing is still done in modify_bars().
>>>
>>> What happens if (based on the attached patch) you change
>>> rebar_ctrl_write() to:
>>>
>>> static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
>>>                                       unsigned int reg,
>>>                                       uint32_t val,
>>>                                       void *data)
>>> {
>>>     pci_conf_write32(pdev->sbdf, reg, val);
>>> }
>>>
>> If I change rebar_ctrl_write() to:
>> static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
>>                                       unsigned int reg,
>>                                       uint32_t val,
>>                                       void *data)
>> {
>>     printk("cjq_debug %pp: bar ctrl write reg %u, val %x\n", &pdev->sbdf, 
>> reg, val);
>>     pci_conf_write32(pdev->sbdf, reg, val);
>> }
>>
>> I can see three time prints, it can't work.
>> (XEN) cjq_debug 0000:03:00.0: bar ctrl write reg 520, val d40
>> (XEN) cjq_debug 0000:03:00.0: bar ctrl write reg 520, val d40
>> (XEN) cjq_debug 0000:03:00.0: bar ctrl write reg 528, val 102
>>
>> If I change rebar_ctrl_write() to:
>> static void cf_check rebar_ctrl_write(const struct pci_dev *pdev,
>>                                       unsigned int reg,
>>                                       uint32_t val,
>>                                       void *data)
>> {
>>     if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
>>         return;
>>     printk("cjq_debug %pp: bar ctrl write reg %u, val %x\n", &pdev->sbdf, 
>> reg, val);
>>     pci_conf_write32(pdev->sbdf, reg, val);
>> } 
>>
>> I can only see one time print:
>> (XEN) cjq_debug 0000:03:00.0: bar ctrl write reg 520, val d40
>>
>> The check prevented the two times incorrect write actions.
>>     if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY )
>>         return;
>>
>> And why my original patch can work too, the check:
>> +    ctrl = pci_conf_read32(pdev->sbdf, reg);
>> +    if ( ctrl == val )
>> +        return;
>> happened to play the same role as PCI_COMMAND_MEMORY check.
> 
> Thank you very much for figuring this out.  So in the end it's a bug
> in the driver that plays with PCI_REBAR_CTRL with memory decoding
> enabled.
Yes, I think.
During driver initiation, it calls pci_rebar_set_size to resize BARs,
after that, it calls pci_restore_state->pci_restore_rebar_state to restore BARs,
the problem is when calling pci_restore_rebar_state, memory deoding is enabled 
state.
I will discuss with my colleagues internally whether this needs to be modified 
in amdgpu driver.

> 
> Won't this also cause issues when running natively without Xen?
Native linux works fine, don't know why. 

> 
> I think we have no other option but to trap accesses to the capability
> registers themselves in order to ensure a minimum amount of sanity
> (iow: no writes to the ReBAR control registers decoding is enabled).
Got it, I will send a V2 that keeps using my method.

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.