[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver
Hello Stefano, > On 29 Oct 2020, at 8:17 pm, Stefano Stabellini <sstabellini@xxxxxxxxxx> wrote: > > On Thu, 29 Oct 2020, Bertrand Marquis wrote: >>> On 28 Oct 2020, at 19:12, Julien Grall <julien@xxxxxxx> wrote: >>> On 26/10/2020 11:03, Rahul Singh wrote: >>>> Hello Julien, >>>>> On 23 Oct 2020, at 4:19 pm, Julien Grall <julien@xxxxxxx> wrote: >>>>> On 23/10/2020 15:27, Rahul Singh wrote: >>>>>> Hello Julien, >>>>>>> On 23 Oct 2020, at 2:00 pm, Julien Grall <julien@xxxxxxx> wrote: >>>>>>> On 23/10/2020 12:35, Rahul Singh wrote: >>>>>>>> Hello, >>>>>>>>> On 23 Oct 2020, at 1:02 am, Stefano Stabellini >>>>>>>>> <sstabellini@xxxxxxxxxx> wrote: >>>>>>>>> >>>>>>>>> On Thu, 22 Oct 2020, Julien Grall wrote: >>>>>>>>>>>> On 20/10/2020 16:25, Rahul Singh wrote: >>>>>>>>>>>>> Add support for ARM architected SMMUv3 implementations. It is >>>>>>>>>>>>> based on >>>>>>>>>>>>> the Linux SMMUv3 driver. >>>>>>>>>>>>> Major differences between the Linux driver are as follows: >>>>>>>>>>>>> 1. Only Stage-2 translation is supported as compared to the Linux >>>>>>>>>>>>> driver >>>>>>>>>>>>> that supports both Stage-1 and Stage-2 translations. >>>>>>>>>>>>> 2. Use P2M page table instead of creating one as SMMUv3 has the >>>>>>>>>>>>> capability to share the page tables with the CPU. >>>>>>>>>>>>> 3. Tasklets is used in place of threaded IRQ's in Linux for event >>>>>>>>>>>>> queue >>>>>>>>>>>>> and priority queue IRQ handling. >>>>>>>>>>>> >>>>>>>>>>>> Tasklets are not a replacement for threaded IRQ. In particular, >>>>>>>>>>>> they will >>>>>>>>>>>> have priority over anything else (IOW nothing will run on the pCPU >>>>>>>>>>>> until >>>>>>>>>>>> they are done). >>>>>>>>>>>> >>>>>>>>>>>> Do you know why Linux is using thread. Is it because of long >>>>>>>>>>>> running >>>>>>>>>>>> operations? >>>>>>>>>>> >>>>>>>>>>> Yes you are right because of long running operations Linux is using >>>>>>>>>>> the >>>>>>>>>>> threaded IRQs. >>>>>>>>>>> >>>>>>>>>>> SMMUv3 reports fault/events bases on memory-based circular buffer >>>>>>>>>>> queues not >>>>>>>>>>> based on the register. As per my understanding, it is >>>>>>>>>>> time-consuming to >>>>>>>>>>> process the memory based queues in interrupt context because of >>>>>>>>>>> that Linux >>>>>>>>>>> is using threaded IRQ to process the faults/events from SMMU. >>>>>>>>>>> >>>>>>>>>>> I didn’t find any other solution in XEN in place of tasklet to >>>>>>>>>>> defer the >>>>>>>>>>> work, that’s why I used tasklet in XEN in replacement of threaded >>>>>>>>>>> IRQs. If >>>>>>>>>>> we do all work in interrupt context we will make XEN less >>>>>>>>>>> responsive. >>>>>>>>>> >>>>>>>>>> So we need to make sure that Xen continue to receives interrupts, >>>>>>>>>> but we also >>>>>>>>>> need to make sure that a vCPU bound to the pCPU is also responsive. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> If you know another solution in XEN that will be used to defer the >>>>>>>>>>> work in >>>>>>>>>>> the interrupt please let me know I will try to use that. >>>>>>>>>> >>>>>>>>>> One of my work colleague encountered a similar problem recently. He >>>>>>>>>> had a long >>>>>>>>>> running tasklet and wanted to be broken down in smaller chunk. >>>>>>>>>> >>>>>>>>>> We decided to use a timer to reschedule the taslket in the future. >>>>>>>>>> This allows >>>>>>>>>> the scheduler to run other loads (e.g. vCPU) for some time. >>>>>>>>>> >>>>>>>>>> This is pretty hackish but I couldn't find a better solution as >>>>>>>>>> tasklet have >>>>>>>>>> high priority. >>>>>>>>>> >>>>>>>>>> Maybe the other will have a better idea. >>>>>>>>> >>>>>>>>> Julien's suggestion is a good one. >>>>>>>>> >>>>>>>>> But I think tasklets can be configured to be called from the >>>>>>>>> idle_loop, >>>>>>>>> in which case they are not run in interrupt context? >>>>>>>>> >>>>>>>> Yes you are right tasklet will be scheduled from the idle_loop that is >>>>>>>> not interrupt conext. >>>>>>> >>>>>>> This depends on your tasklet. Some will run from the softirq context >>>>>>> which is usually (for Arm) on the return of an exception. >>>>>>> >>>>>> Thanks for the info. I will check and will get better understanding of >>>>>> the tasklet how it will run in XEN. >>>>>>>>> >>>>>>>>>>>>> 4. Latest version of the Linux SMMUv3 code implements the >>>>>>>>>>>>> commands queue >>>>>>>>>>>>> access functions based on atomic operations implemented in >>>>>>>>>>>>> Linux. >>>>>>>>>>>> >>>>>>>>>>>> Can you provide more details? >>>>>>>>>>> >>>>>>>>>>> I tried to port the latest version of the SMMUv3 code than I >>>>>>>>>>> observed that >>>>>>>>>>> in order to port that code I have to also port atomic operation >>>>>>>>>>> implemented >>>>>>>>>>> in Linux to XEN. As latest Linux code uses atomic operation to >>>>>>>>>>> process the >>>>>>>>>>> command queues >>>>>>>>>>> (atomic_cond_read_relaxed(),atomic_long_cond_read_relaxed() , >>>>>>>>>>> atomic_fetch_andnot_relaxed()) . >>>>>>>>>> >>>>>>>>>> Thank you for the explanation. I think it would be best to import >>>>>>>>>> the atomic >>>>>>>>>> helpers and use the latest code. >>>>>>>>>> >>>>>>>>>> This will ensure that we don't re-introduce bugs and also buy us >>>>>>>>>> some time >>>>>>>>>> before the Linux and Xen driver diverge again too much. >>>>>>>>>> >>>>>>>>>> Stefano, what do you think? >>>>>>>>> >>>>>>>>> I think you are right. >>>>>>>> Yes, I agree with you to have XEN code in sync with Linux code that's >>>>>>>> why I started with to port the Linux atomic operations to XEN then I >>>>>>>> realised that it is not straightforward to port atomic operations and >>>>>>>> it requires lots of effort and testing. Therefore I decided to port >>>>>>>> the code before the atomic operation is introduced in Linux. >>>>>>> >>>>>>> Hmmm... I would not have expected a lot of effort required to add the 3 >>>>>>> atomics operations above. Are you trying to also port the LSE support >>>>>>> at the same time? >>>>>> There are other atomic operations used in the SMMUv3 code apart from the >>>>>> 3 atomic operation I mention. I just mention 3 operation as an example. >>>>> >>>>> Ok. Do you have a list you could share? >>>>> >>>> Yes. Please find the list that we have to port to the XEN in order to >>>> merge the latest SMMUv3 code. >>> >>> Thanks! >>> >>>> If we start to port the below list we might have to port another atomic >>>> operation based on which below atomic operations are implemented. I have >>>> not spent time on how these atomic operations are implemented in detail >>>> but as per my understanding, it required an effort to port them to XEN and >>>> required a lot of testing. >>> >>> For the beginning, I think it is fine to implement them with a stronger >>> memory barrier than necessary or in a less efficient. This can be refined >>> afterwards. >>> >>> Those helpers can directly be defined in the SMMUv3 drivers so we know they >>> are not for general purpose :). >>> >>>> 1. atomic_set_release >>> >>> This could be implemented as: >>> >>> smp_mb(); >>> atomic_set(); >>> >>>> 2. atomic_fetch_andnot_relaxed >>> >>> This would need to be imported. >>> >>>> 3. atomic_cond_read_relaxed >>> >>> This would need to be imported. The simplest version seems to be the >>> generic version provided by include/asm-generic/barrier.h (see >>> smp_cond_load_relaxed). >>> >>>> 4. atomic_long_cond_read_relaxed >>>> 5. atomic_long_xor >>> >>> The two would require the implementation of atomic64. Volodymyr also >>> required a version. I offered my help, however I didn't find enough time to >>> do it yet :(. >>> >>> For Arm64, it would be possible to do a copy/paste of the existing helpers >>> and replace anything related to a 32-bit register with a 64-bit one. >>> >>> For Arm32, they are a bit more complex because you now need to work with 2 >>> registers. >>> >>> However, for your purpose, you would be using atomic_long_t. So the the >>> Arm64 implementation should be sufficient. >>> >>>> 6. atomic_set_release >>> >>> Same as 1. >>> >>>> 7. atomic_cmpxchg_relaxed might be we can use atomic_cmpxchg that is >>>> implemented in XEN need to check. >>> atomic_cmpxchg() is strongly ordered. So it would be fine to use it for >>> implementing the helper. Although, more inefficient :). >>> >>>> 8. atomic_dec_return_release >>> >>> Xen implements a stronger version atomic_dec_return(). You can re-use it >>> here. >>> >>>> 9. atomic_fetch_inc_relaxed >>> >>> This would need to be imported. >> >> We do agree that this would be the work required but some of the things to >> be imported have dependencies and this is not >> a simple change on the patch done by Rahul and it would require to almost >> restart the implementation and testing from the >> very beginning. >> >> In the meantime could we process with the review of this SMMUv3 driver and >> include it in Xen as is ? >> >> We can set me and Rahul as maintainers and flag the driver as experimental >> in Support.md (it is already >> protected by the EXPERT configuration in Kconfig). > > I think that is OK as long as you make sure to go through the changelog > of the Linux driver to make sure we are not missing any bug fixes and > errata fixes. Ok Yes when I ported the driver I port the command queue operation from the previous commit where atomic operations is not used and rest all the code is from the latest code. I will again make sure that any bug that is fixed in Linux should be fixed in XEN also. Regards, Rahul
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |