[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG]SMMU-V3 queue need no-cache memory



Hi,

在 2022/12/8 21:32, Julien Grall 写道:
Hi,

On 08/12/2022 13:27, sisyphean wrote:

在 2022/12/8 21:21, Julien Grall 写道:
Hi,

On 08/12/2022 03:22, sisyphean wrote:
在 2022/12/8 06:22, Stefano Stabellini 写道:

On Wed, 7 Dec 2022, Julien Grall wrote:
Hi,

I only noticed this e-mail because I was skimming xen-devel. If you want to get our attention, then I would suggest to CC both of us because I (and I guess Stefano) have filter rules so those e-mails land directly in my inbox.

On 07/12/2022 10:24, Rahul Singh wrote:
On 7 Dec 2022, at 2:04 am, sisyphean <sisyphean@zlw.email> wrote:

Hi,

      I try to run XEN on my ARM board(Sorry, for some commercial reasons,
I can't tell you
      on which platform I run XEN)  and enable SMMU-V3, but all cmds in
cmdq failed when XEN started.

      After using the debugger to track debugging, the reason for this
problem is that
      the queue in the smmu-v3 driver is not no-cache, so after the
function arm_smmu_cmdq_build_cmd
      is executed, the cmd is still in cache.Therefore, the SMMU-V3
hardware cannot obtain the correct cmd
      from the memory for execution.
Yes you are right as of now we are allocating the memory for cmdqueue via
_xzalloc() which is cached
memory because of that you are observing the issue. We have tested the Xen
SMMUv3 driver on SOC
where SMMUv3 HW is in the coherency domain, and because of that we have not
encountered this issue.

I think In your case SMMUv3 HW is not in the coherency domain. Please
confirm from your side if the
"dma-coherent” property is not set in DT.

I think there is no function available as of now to request Xen to allocate
memory that is not cached.
You are correct.

@Julien and @Stefano do you have any suggestion on how we can request memory
from Xen that is not
cached something like dma_alloc_coherent() in Linux.
At the moment all the RAM is mapped cacheable in Xen. So it will require some
work to have some memory uncacheable.

There are two options:
  1) Allocate a pool of memory at boot time that will be mapped with different memory attribute. This means we would need a separate pool and the user will
have to size it.
  2) Modify after the allocation the caching attribute in the memory and then revert back after freeing. The cons is we would end up to shatter superpage. We also can't re-create superpage (yet), but that might be fine if the memory
is never freed.

Option two would probably the best. But before going that route I have one
question...

The temporary solution I use is to execute function clean_dcache every time cmd is copied to cmdq in function queue_write. But it is obvious
that this will seriously affect the efficiency.
I agree you will see some performance impact in micro-benchmark. But I am not sure about normal use-cases. How often do you expect the command queue to be
used?
That is a good question. But even for the micro-benchmark, is the
difference significant?

My gut feeling (to be discussed and confirmed) is that for this use-case it might not be worth to do option 1) or option 2) above. Clean_dcache
as needed might be good enough?


Also, I am a bit surprised you are seing issue with the command queue but not with the stage-2 page-tables. Does your SMMU support coherent walk but cannot
snoop for the command queue?

Hi,

I'm sorry that my statement made you misunderstand. I haven't conducted micro-benchmark yet.

I found this problem because "CMD_SYNC timeout" was frequently prompted when initializing
SMMUv3 during XEN startup.

As for the usage frequency of the command queue, I'm trying to passthrough PCIE devices to the DomU. According to my understanding, all operations on the device will be performed through SMMUv3 after
the device passesthrough? Therefore, queues will be used frequently.
"all operations on the device" is a bit vague. From what Rahul just wrote this is a command queue is for controlling the SMMU (e.g. assign the device, flush the TLBs...). Anything related to the access (e.g. accessing the BAR, configuration space...) are not going through it.

Cheers,

So does this mean that operations on smmu queues are not frequent? There are still some problems with PCIE device passthrough. I will conduct some benchmark tests after completing PCIE device passthrough. Are there any test cases for my reference?

See my reply to Rahul. I have provided some ideas how to benchmark it.

Cheers,

Thanks for your suggestion. I will write some test cases to do some benchmark tests after completing the PCIE passthrough.

Cheers,




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.