[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Rahul Singh <Rahul.Singh@xxxxxxx>
  • Date: Fri, 30 Oct 2020 08:46:11 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LDV+57uX0TZe2aFVIlALMBZ/xjsan6sdHjyjibqPVQE=; b=KFId2AoQTI5Sq6VlltSu9LfofQPjHkOLSI+wV1z/1pYrwW2LcYNek0qzTOtTvdJMT/Kr/44/GkLoOYPXEuZSZ/R3GmkV/+7+xp2crpsc3C181OqhghCHmSLKFjqYNqBvZxskhzWwx6IEuPyupt8Lw3/q1+u/EbHj7kG86+6BjiDzuHyo7+6BqvxpXF97o4FXqfDppYicWysNddTrjmXRu+S52/ntnpHepJuzlaPu4XocXToOP/zIYwmgJU8IpaAYmZGNxodO1Sr7vKI2jJgmKaa9n6HAJFVjwWOSdFgLYAyOd1kjrVuHnOBl/4Vj7BjYVv0ylaZII67LiW1xewNBqw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ex5gMdECafIVmcYAvDpWp1pL11gBlOAsDWpGObhHLiuDGXfRcsx1cm1X1CcJh7iUgbvN8tRr2nYckQHcuORLOKWbPl4tNU5t+hvO6Wm/4Xs5725Q86JBsCjCudqPkX6K+BOfjXejABpXTIqxIa9guTIDI73EWg7mtIyy5ix1Znnj58TIyyTDw40g+SoqLQuEpSbOo+/b/rpc7JyUZVMSMxSO9FRBMMPSlpIEfyp/V1UbqLzN2fFDXf7KMenP6qfScs1YTIBE26I90sdQwYR9wbad6tk92mclHOdpYW1y5jHUXRISbPoHZ/WfjXvKLDUpPVqZYHMsV6awgNXUj5vO0w==
  • Authentication-results-original: kernel.org; dkim=none (message not signed) header.d=none;kernel.org; dmarc=none action=none header.from=arm.com;
  • Cc: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Julien Grall <julien@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • Delivery-date: Fri, 30 Oct 2020 08:47:06 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: kernel.org; dkim=none (message not signed) header.d=none;kernel.org; dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHWoVvrkmJwOYERdUOadvid1OghFamgw0AAgAEzsQCAAWIugIABA+CAgADBXICAABfBAIAAGI6AgAAOSACABG94AIADrVCAgAFBfQCAAGM1gIAA0RiA
  • Thread-topic: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver

Hello Stefano,

> On 29 Oct 2020, at 8:17 pm, Stefano Stabellini <sstabellini@xxxxxxxxxx> wrote:
> 
> On Thu, 29 Oct 2020, Bertrand Marquis wrote:
>>> On 28 Oct 2020, at 19:12, Julien Grall <julien@xxxxxxx> wrote:
>>> On 26/10/2020 11:03, Rahul Singh wrote:
>>>> Hello Julien,
>>>>> On 23 Oct 2020, at 4:19 pm, Julien Grall <julien@xxxxxxx> wrote:
>>>>> On 23/10/2020 15:27, Rahul Singh wrote:
>>>>>> Hello Julien,
>>>>>>> On 23 Oct 2020, at 2:00 pm, Julien Grall <julien@xxxxxxx> wrote:
>>>>>>> On 23/10/2020 12:35, Rahul Singh wrote:
>>>>>>>> Hello,
>>>>>>>>> On 23 Oct 2020, at 1:02 am, Stefano Stabellini 
>>>>>>>>> <sstabellini@xxxxxxxxxx> wrote:
>>>>>>>>> 
>>>>>>>>> On Thu, 22 Oct 2020, Julien Grall wrote:
>>>>>>>>>>>> On 20/10/2020 16:25, Rahul Singh wrote:
>>>>>>>>>>>>> Add support for ARM architected SMMUv3 implementations. It is 
>>>>>>>>>>>>> based on
>>>>>>>>>>>>> the Linux SMMUv3 driver.
>>>>>>>>>>>>> Major differences between the Linux driver are as follows:
>>>>>>>>>>>>> 1. Only Stage-2 translation is supported as compared to the Linux 
>>>>>>>>>>>>> driver
>>>>>>>>>>>>>   that supports both Stage-1 and Stage-2 translations.
>>>>>>>>>>>>> 2. Use P2M  page table instead of creating one as SMMUv3 has the
>>>>>>>>>>>>>   capability to share the page tables with the CPU.
>>>>>>>>>>>>> 3. Tasklets is used in place of threaded IRQ's in Linux for event 
>>>>>>>>>>>>> queue
>>>>>>>>>>>>>   and priority queue IRQ handling.
>>>>>>>>>>>> 
>>>>>>>>>>>> Tasklets are not a replacement for threaded IRQ. In particular, 
>>>>>>>>>>>> they will
>>>>>>>>>>>> have priority over anything else (IOW nothing will run on the pCPU 
>>>>>>>>>>>> until
>>>>>>>>>>>> they are done).
>>>>>>>>>>>> 
>>>>>>>>>>>> Do you know why Linux is using thread. Is it because of long 
>>>>>>>>>>>> running
>>>>>>>>>>>> operations?
>>>>>>>>>>> 
>>>>>>>>>>> Yes you are right because of long running operations Linux is using 
>>>>>>>>>>> the
>>>>>>>>>>> threaded IRQs.
>>>>>>>>>>> 
>>>>>>>>>>> SMMUv3 reports fault/events bases on memory-based circular buffer 
>>>>>>>>>>> queues not
>>>>>>>>>>> based on the register. As per my understanding, it is 
>>>>>>>>>>> time-consuming to
>>>>>>>>>>> process the memory based queues in interrupt context because of 
>>>>>>>>>>> that Linux
>>>>>>>>>>> is using threaded IRQ to process the faults/events from SMMU.
>>>>>>>>>>> 
>>>>>>>>>>> I didn’t find any other solution in XEN in place of tasklet to 
>>>>>>>>>>> defer the
>>>>>>>>>>> work, that’s why I used tasklet in XEN in replacement of threaded 
>>>>>>>>>>> IRQs. If
>>>>>>>>>>> we do all work in interrupt context we will make XEN less 
>>>>>>>>>>> responsive.
>>>>>>>>>> 
>>>>>>>>>> So we need to make sure that Xen continue to receives interrupts, 
>>>>>>>>>> but we also
>>>>>>>>>> need to make sure that a vCPU bound to the pCPU is also responsive.
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> If you know another solution in XEN that will be used to defer the 
>>>>>>>>>>> work in
>>>>>>>>>>> the interrupt please let me know I will try to use that.
>>>>>>>>>> 
>>>>>>>>>> One of my work colleague encountered a similar problem recently. He 
>>>>>>>>>> had a long
>>>>>>>>>> running tasklet and wanted to be broken down in smaller chunk.
>>>>>>>>>> 
>>>>>>>>>> We decided to use a timer to reschedule the taslket in the future. 
>>>>>>>>>> This allows
>>>>>>>>>> the scheduler to run other loads (e.g. vCPU) for some time.
>>>>>>>>>> 
>>>>>>>>>> This is pretty hackish but I couldn't find a better solution as 
>>>>>>>>>> tasklet have
>>>>>>>>>> high priority.
>>>>>>>>>> 
>>>>>>>>>> Maybe the other will have a better idea.
>>>>>>>>> 
>>>>>>>>> Julien's suggestion is a good one.
>>>>>>>>> 
>>>>>>>>> But I think tasklets can be configured to be called from the 
>>>>>>>>> idle_loop,
>>>>>>>>> in which case they are not run in interrupt context?
>>>>>>>>> 
>>>>>>>> Yes you are right tasklet will be scheduled from the idle_loop that is 
>>>>>>>> not interrupt conext.
>>>>>>> 
>>>>>>> This depends on your tasklet. Some will run from the softirq context 
>>>>>>> which is usually (for Arm) on the return of an exception.
>>>>>>> 
>>>>>> Thanks for the info. I will check and will get better understanding of 
>>>>>> the tasklet how it will run in XEN.
>>>>>>>>> 
>>>>>>>>>>>>> 4. Latest version of the Linux SMMUv3 code implements the 
>>>>>>>>>>>>> commands queue
>>>>>>>>>>>>>   access functions based on atomic operations implemented in 
>>>>>>>>>>>>> Linux.
>>>>>>>>>>>> 
>>>>>>>>>>>> Can you provide more details?
>>>>>>>>>>> 
>>>>>>>>>>> I tried to port the latest version of the SMMUv3 code than I 
>>>>>>>>>>> observed that
>>>>>>>>>>> in order to port that code I have to also port atomic operation 
>>>>>>>>>>> implemented
>>>>>>>>>>> in Linux to XEN. As latest Linux code uses atomic operation to 
>>>>>>>>>>> process the
>>>>>>>>>>> command queues 
>>>>>>>>>>> (atomic_cond_read_relaxed(),atomic_long_cond_read_relaxed() ,
>>>>>>>>>>> atomic_fetch_andnot_relaxed()) .
>>>>>>>>>> 
>>>>>>>>>> Thank you for the explanation. I think it would be best to import 
>>>>>>>>>> the atomic
>>>>>>>>>> helpers and use the latest code.
>>>>>>>>>> 
>>>>>>>>>> This will ensure that we don't re-introduce bugs and also buy us 
>>>>>>>>>> some time
>>>>>>>>>> before the Linux and Xen driver diverge again too much.
>>>>>>>>>> 
>>>>>>>>>> Stefano, what do you think?
>>>>>>>>> 
>>>>>>>>> I think you are right.
>>>>>>>> Yes, I agree with you to have XEN code in sync with Linux code that's 
>>>>>>>> why I started with to port the Linux atomic operations to XEN  then I 
>>>>>>>> realised that it is not straightforward to port atomic operations and 
>>>>>>>> it requires lots of effort and testing. Therefore I decided to port 
>>>>>>>> the code before the atomic operation is introduced in Linux.
>>>>>>> 
>>>>>>> Hmmm... I would not have expected a lot of effort required to add the 3 
>>>>>>> atomics operations above. Are you trying to also port the LSE support 
>>>>>>> at the same time?
>>>>>> There are other atomic operations used in the SMMUv3 code apart from the 
>>>>>> 3 atomic operation I mention. I just mention 3 operation as an example.
>>>>> 
>>>>> Ok. Do you have a list you could share?
>>>>> 
>>>> Yes. Please find the list that we have to port to the XEN in order to 
>>>> merge the latest SMMUv3 code.
>>> 
>>> Thanks!
>>> 
>>>> If we start to port the below list we might have to port another atomic 
>>>> operation based on which below atomic operations are implemented. I have 
>>>> not spent time on how these atomic operations are implemented in detail 
>>>> but as per my understanding, it required an effort to port them to XEN and 
>>>> required a lot of testing.
>>> 
>>> For the beginning, I think it is fine to implement them with a stronger 
>>> memory barrier than necessary or in a less efficient. This can be refined 
>>> afterwards.
>>> 
>>> Those helpers can directly be defined in the SMMUv3 drivers so we know they 
>>> are not for general purpose :).
>>> 
>>>> 1. atomic_set_release
>>> 
>>> This could be implemented as:
>>> 
>>> smp_mb();
>>> atomic_set();
>>> 
>>>> 2. atomic_fetch_andnot_relaxed
>>> 
>>> This would need to be imported.
>>> 
>>>> 3. atomic_cond_read_relaxed
>>> 
>>> This would need to be imported. The simplest version seems to be the 
>>> generic version provided by include/asm-generic/barrier.h (see 
>>> smp_cond_load_relaxed).
>>> 
>>>> 4. atomic_long_cond_read_relaxed
>>>> 5. atomic_long_xor
>>> 
>>> The two would require the implementation of atomic64. Volodymyr also 
>>> required a version. I offered my help, however I didn't find enough time to 
>>> do it yet :(.
>>> 
>>> For Arm64, it would be possible to do a copy/paste of the existing helpers 
>>> and replace anything related to a 32-bit register with a 64-bit one.
>>> 
>>> For Arm32, they are a bit more complex because you now need to work with 2 
>>> registers.
>>> 
>>> However, for your purpose, you would be using atomic_long_t. So the the 
>>> Arm64 implementation should be sufficient.
>>> 
>>>> 6. atomic_set_release
>>> 
>>> Same as 1.
>>> 
>>>> 7. atomic_cmpxchg_relaxed might be we can use atomic_cmpxchg that is 
>>>> implemented in XEN need to check.
>>> atomic_cmpxchg() is strongly ordered. So it would be fine to use it for 
>>> implementing the helper. Although, more inefficient :).
>>> 
>>>> 8. atomic_dec_return_release
>>> 
>>> Xen implements a stronger version atomic_dec_return(). You can re-use it 
>>> here.
>>> 
>>>> 9. atomic_fetch_inc_relaxed
>>> 
>>> This would need to be imported.
>> 
>> We do agree that this would be the work required but some of the things to 
>> be imported have dependencies and this is not
>> a simple change on the patch done by Rahul and it would require to almost 
>> restart the implementation and testing from the
>> very beginning.
>> 
>> In the meantime could we process with the review of this SMMUv3 driver and 
>> include it in Xen as is ?
>> 
>> We can set me and Rahul as maintainers and flag the driver as experimental 
>> in Support.md (it is already
>> protected by the EXPERT configuration in Kconfig).
> 
> I think that is OK as long as you make sure to go through the changelog
> of the Linux driver to make sure we are not missing any bug fixes and
> errata fixes.

Ok Yes when I ported the driver I port the command queue operation from the 
previous commit where atomic operations is not used and rest all the code is 
from the latest code. I will again make sure that any bug that is fixed in 
Linux should be fixed in XEN also.


Regards,
Rahul


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.