[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN v2] xen/arm64: io: Decode 32-bit ldr/str post-indexing instructions


  • To: Andre Przywara <andre.przywara@xxxxxxx>, Ayan Kumar Halder <ayan.kumar.halder@xxxxxxxxxx>
  • From: Ayan Kumar Halder <ayan.kumar.halder@xxxxxxxxxx>
  • Date: Mon, 10 Jan 2022 14:33:11 +0000
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 149.199.80.198) smtp.rcpttodomain=arm.com smtp.mailfrom=xilinx.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=xilinx.com; dkim=none (message not signed); arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=6CumRueFBVoaazG2t53D450FpQPhJXZ9h/MAdjpmxH0=; b=UD/a7HxdfHHDzJn33mC5UG8OzmAiVKo6NnJ4NTMNjQyv8yax2KQCU6/kNwBC5px+GjVyFl6HGSnay+yz8PrdFCHrBX08e1BDAyq7LcNhgF9ZjC20FYirYnAXzl5ibyDmwmDaro0HFzMnRHbc5PwwwwEEjHQZwXiHRuGXSPjIhJreTon8zZYobYWrem3LTGuww6XLhiG0UmB6fPXpmD4vVZR/n9sWIRi85pklc56bEvUe0QVb12ORcFa/p/zx9FlYJAgWb+OPHApUO3j2eseMZCMFv+2TLIoVd1R+9It8lFOlzwmYIi6PkXuZEKPdGyPayX0rYRot9SoVFZPYF+GDFA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GHEFESl4F+f8qyYFQXMuBbZxrIdDepEok/nkeahSZfFEYzyCmCW4rCI/vZ79JlnxI/RKQii8Ke57qdO1G95ZDd6BJthIXRp499JtwrFdWgFfNHmS9nZOJ4EDufn6OKaDAP5wLV79xukYPBh9wMUbspq23KNHCwe+EIu5yKPK0+INUOdO8uywvK0XzBTBg+3iEbNnewNAsDRepQC8VeabcMT4OLXBPesKIrWFswUvTHzya1W1Uc0gDA4ZEPBNAJ80VEHX24ymjVKHbqkovqPmMXJS7NQbZLNTJTm7Fr93qEWUhzmp58rESpGo+fUGSSkyKyYwTF2G1th77SpaWSaP5w==
  • Cc: Julien Grall <julien@xxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, "stefanos@xxxxxxxxxx" <stefanos@xxxxxxxxxx>, "Volodymyr_Babchuk@xxxxxxxx" <Volodymyr_Babchuk@xxxxxxxx>, "jbeulich@xxxxxxxx" <jbeulich@xxxxxxxx>, <beata.michalska@xxxxxxxxxx>, <richard.henderson@xxxxxxxxxx>
  • Delivery-date: Mon, 10 Jan 2022 14:33:54 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi Andre,

Many thanks for your inputs. It is making better sense now. Much appreciated.

A few questions/clarifications :-

On 06/01/2022 15:33, Andre Przywara wrote:
On Wed, 5 Jan 2022 16:55:11 +0000
Ayan Kumar Halder <ayan.kumar.halder@xxxxxxxxxx> wrote:

Hi,

Thank you so much for your feedback.

I need a couple of clarifications before I can start with the v3 patch.

On 08/12/2021 12:00, Andre Przywara wrote:
On Mon, 6 Dec 2021 19:31:06 +0000
Julien Grall <julien@xxxxxxx> wrote:

Hi,
 
On 01/12/2021 08:41, Bertrand Marquis wrote:  
Hi Ayan,
     
On 30 Nov 2021, at 19:13, Ayan Kumar Halder <ayan.kumar.halder@xxxxxxxxxx> wrote:

Hi Andre,

Thanks for your comments. They are useful.

On 30/11/2021 09:49, Andre Przywara wrote:  
On Mon, 29 Nov 2021 19:16:38 +0000
Ayan Kumar Halder <ayan.kumar.halder@xxxxxxxxxx> wrote:
Hi,  
At the moment, Xen is only handling data abort with valid syndrome (i.e.
ISV=0). Unfortunately, this doesn't cover all the instructions a domain
could use to access MMIO regions.

For instance, Xilinx baremetal OS will use:

          volatile u32 *LocalAddr = (volatile u32 *)Addr;
          *LocalAddr = Value;

This leave the compiler to decide which store instructions to use.  
As mentioned in the other email, this is wrong, if this points to MMIO:
don't let the compiler do MMIO accesses. If a stage 2 fault isn't in
an MMIO area, you should not see traps that you cannot handle already.
So I don't think it's a good idea to use that as an example. And since
this patch only seems to address this use case, I would doubt its
usefulness in general.  
Yes, I should have fixed the comment.

Currently, I am testing with baremetal app which uses inline assembly code with post indexing instructions, to access the MMIO.

ATM, I am testing with 32 bit MMIO only.

On the usefulness, I am kind of torn as it is legitimate for post indexing instructions to be used in an inline-assembly code for accessing MMIO. However, that may not be something commonly seen.

@Stefano/Bertrand/Julien/Volodymyr :- As you are the Arm mantainers, can you comment if we should have decoding logic or not ?  
Andre gave you the official statement from Arm and there is nothing more I can say.  
I think this would be handy for other hypervisor and OS developper to
know what they can expect when running in a virtualized environment. So
would it be possible to update the Arm Arm reflecting this statement?  
I don't think it's within the scope of the ARM ARM to say that. It just
says that "there is no syndrome information", and your mileage may vary in
working around that.

Personally I would say that if you expect your software to work nicely
under a hypervisor, then just avoid those instructions. The Linux kernel
certainly did so.

You can try to do instruction emulation, but doing this right can get
tricky quickly: think about I$/D$ coherency, MMU on or off, etc.  
I am trying to get all the restrictions that need to be checked. I have 
referred 
https://developer.arm.com/documentation/dui0802/a/A64-General-Instructions/Register-restrictions-for-A64-instructions?lang=en 
and "Arm A64 Instruction Set Architecture - DDI 0596" - LDR (immediate).

So far I only see the following restrictions:-

Rn -ne Rt

Rt -ne SP

You had mentioned the following cases :-

1. XZR vs SP - I see that both these refer to register no 31. Xen gets 
the register number (for Rn/Rt) only, so I am not sure what is to be 
done here.
But the emulation code in Xen needs to know whether number 31 refers to the
stack pointer or to the zero register. This depends on the context of the
instruction. It's nothing magical about it, you just need to figure out
which it is in your case and make sure that the code uses the right
value. In the LDR case n==31 means SP, but t==31 means XZR. The emulation
would need to consider this.

I see what you mean. I had a look at xen/tools/qemu-xen-dir-remote/target/arm/cpu.h, CPUARMState {}. I don't see anywhere we need to refer register by its name. IIUC, the instruction decoding logic needs to know the register number only.

In fact, Xen would only read CPUARMState->xregs[31] for SP/XZR in Aarch64 context. Let me know what I might be missing here.

Beata/Richard - I see you have committed most recently on the file. Please feel free to add your inputs.


2. MMU on or off - As I see in try_handle_mmio(), one gets the physical 
address in gpa. So I am not sure what is to be done here.
You *might* be fine, if Xen takes care of that, I don't know.
But it needs to explicitly consider those two cases - and also make
sure that caching effects are cared for.

3. I/D coherency - I don't understand how this affects instruction decoding.
In the ARM architecture the I cache and D cache are not necessarily
coherent. So whatever the CPU reads via the data bus does not need to be
the same data that the instruction fetcher read a while ago. And while
there is a well-known sequence to make the current data side visible to
the instruction side, there is nothing architectural for the other way
around.
For example the CPU fetches a bunch of instructions into the I cache, then
consumes it from there. Now some code changes those instruction words in
"memory". Without explicit D-cache-clean-to-the-PoU and
I-cache-invalidate+ISB the CPU might still execute the old instructions.

This makes sense. Referring https://developer.arm.com/documentation/den0024/a/Caches/Cache-maintenance, I assume that the following instructions need to be executed before the instruction is decoded.

Xn <--- contains the address of the instruction to be decoded.

STR Wt, [Xn] DC CVAU, Xn // Clean data cache by VA to point of unification (PoU) DSB ISH // Ensure visibility of the data cleaned from cache IC IVAU, Xn // Invalidate instruction cache by VA to PoU DSB ISH // Ensure completion of the invalidations ISB // Synchronize the fetched instruction stream

But if your emulation code tries to read the instruction from memory, it
fetches the data-visible side of it - which is not what the CPU executed,
so end up emulating the wrong thing.

I am guessing that this should not be a very unusual case. Isn't this similar to a host processor loading and modifying a firmware in the memory. The firmware can then be executed by a different micro-controller or some programmable engine. I mean the data pipeline or instruction pipeline should read the same contents from memory.


Please also keep in mind that the architecture does not *guarantee*
coherency, but in reality it might actually be coherent - either because
of the particular silicon implementation, or because of side effects in
the system (data cache line being evicted, ISB executed). So seeing the
effects of coherency in your testing does not mean you are safe, it might
just happen by chance.

Is there some good way to test this to ensure the best possible reliability ?

- Ayan


Cheers,
Andre

Please help me to understand further.

- Ayan

 
I will leave this decision to Stefano and Julien.  
I have had a chat on IRC with Stefano about this. I think the main
sticking point is the Arm Arm doesn't clearly state those instructions
should not be used by a virtualized OS on MMIO regions.  
I don't understand why the ARM ARM would need to say that. Certainly you
realise that immediately when trying to use them, and apparently it was not
a problem in the last 8ish years of Xen/ARM's existence.

So it's your decision on having the emulation, I personally would only do
it when there is a *good* use case.
And please apply the demanded scrutiny on the review - including all the
corner cases like Rn=Rt, XZR vs. SP (as Jan said) and possibly MMU status.

Cheers,
Andre
 
To me, this topic looks similar to the set/way instruction dilemma. They
are a pain to virtualize (and the Arm Arm clearly hint it) but we had to
do it because some OSes relied on them.

I think the main difference is the Arm Arm doesn't hint they should not
be used (it only says a valid syndrome is not provided) and the
implementation should hopefully be smaller and self-contained.

So I would be inclined to allow Xen to decode post-indexing instructions
(pending the review).

Cheers,
 

    

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.