[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v4 6/8] xen/arm: scmi: introduce SCI SCMI SMC multi-agent driver



On Thu, 19 Jun 2025, Oleksii Moisieiev wrote:
> On 18/06/2025 10:22, Julien Grall wrote:
> > Hi,
> >
> > On 18/06/2025 00:38, Stefano Stabellini wrote:
> >> On Thu, 12 Jun 2025, Grygorii Strashko wrote:
> >>> On 02.06.25 10:17, Bertrand Marquis wrote:
> >>>>> On the other hand, if we also want to handle the case where the SCMI
> >>>>> server could be on a separate co-processor, then what this code is 
> >>>>> doing
> >>>>> is not sufficient because we also need a dcache flush, in addition to
> >>>>> the DSB.
> >>>>>
> >>>>> Bertrand, can you double-check?
> >>>>
> >>>> If we want to handle a case where the memory is accessible to a 
> >>>> coprocessor
> >>>> but there is no cache coherency, we need to flush the dcache 
> >>>> definitely.
> >>>>
> >>>> Seeing the amount of data here, I do agree with Stefano that it 
> >>>> would be a
> >>>> good
> >>>> idea to make the provision to flush the data cache in all cases. 
> >>>> Even if the
> >>>> data
> >>>> is accessed by a secure partition or the firmware coherently, 
> >>>> flushing in
> >>>> all cases
> >>>> would have very limited performance impact here.
> >>>>
> >>>> There is the other solution to have some kind of parameter to say 
> >>>> if the
> >>>> accessor
> >>>> has coherent cache access but I do not think the performance impact 
> >>>> here
> >>>> would
> >>>> justify such a complexity.
> >>>>
> >>> The SCMI shmem expected to be mapped as MT_NON_CACHEABLE in all cases.
> >
> > I can't find MT_NON_CACHEABLE anywhere in Xen or Linux. My 
> > interpretation is that the memory attribute would be normal memory non 
> > cacheable. However, this doesn't add up with ...
> >
> Sorry for the confusion. This define was taken from TF-A and it is the 
> same as Xen MT_NORMAL_NC.
> 
> The main idea was to mention that memory is non_cachable.
> 
> >>> The Linux does devm_ioremap() -> ioremap() ->
> >>> (ARM64)  __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE))
> >
> > ... this line. This is device nGnRE which is a lot more restrictive 
> > (for instance it doesn't allow unaligned access).
> >
> In Xen scmi memory is mapped using ioremap_nocache which is mapped as 
> MT_DEVICE_nGnRE (same as linux).
> 
> In TF-A SCMI shared memory is mapped as MT_DEVICE (which is 
> nGnRE: MAIR_DEV_nGnRE).
> 
> Again, sorry for the confusion.
> 
> >>>
> >>> There is also note in docs:
> >>> "+- shmem: shared memory for messages transfer, **Xen page aligned** 
> >>> with
> >>> mapping``p2m_mmio_direct_nc``."
> >>>
> >>> In the case of SCP - the SCMI shmem can be actually be in SRAM.
> >>>
> >>> So, are you sure cache manipulations are required here?
> >>
> >> No, if the memory is mapped as uncacheable everywhere then the cache
> >> manipulations are not needed. However, we probably still need a dsb.
> >>
> >> I understand now why they decided to use __memcpy_fromio in Linux: it is
> >> not MMIO but they needed a memcpy followed by DSB, so they decided to
> >> reuse the existing MMIO functions although the buffer is not MMIO.
> >
> > From my understanding, memcpy_fromio() is not just a mempcy() + dsb. 
> > It also guarantees the access will be aligned (this is not guarantee 
> > by our memcpy()).
> >
>  From my understanding Linux using memcpy_fromio() because memcpy 
> function is highly
> 
> optimized in linux and will produce exception,and looking into the 
> memcpy implementation
> 
> in Xen (xen/arch/arm/arm64/lib/memcpy.S) I'm not sure if it can be used 
> instead of memcpy_fromio. Could you please advise how to proceed?

If we map the memory in Xen as normal memory non-cacheable, then for
sure we should be able to use the regular memcpy plus a DSB at the end.
That's because unaligned accesses are allowed.

On the other hand, if we map the memory in Xen as device memory, then we
need to be careful about alignment.

Looking at xen/arch/arm/arm64/lib/memcpy.S, it seems to me that:
- it uses aligned accesses for size >= 16 bytes
- for size < 16 bytes, accesses might be unaligned depending on the
  alignment of the start address
- the start address, assuming it is the shared memory start address, is
  4K aligned, so we should be fine?

So it seems to me that we should be OK with using the regular memcpy
(plus a DSB at the end). It would be good for someone else to confirm.

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.