[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common




On 12.08.20 01:47, Stefano Stabellini wrote:

Hi Stefano

On Tue, 11 Aug 2020, Oleksandr wrote:
On 11.08.20 12:19, Julien Grall wrote:
On 11/08/2020 00:34, Stefano Stabellini wrote:
On Mon, 10 Aug 2020, Oleksandr wrote:
On 08.08.20 01:19, Oleksandr wrote:
On 08.08.20 00:50, Stefano Stabellini wrote:
On Fri, 7 Aug 2020, Oleksandr wrote:
On 06.08.20 03:37, Stefano Stabellini wrote:

Hi Stefano

Trying to simulate IO_RETRY handling mechanism (according to model
below) I
continuously get IO_RETRY from try_fwd_ioserv() ...

OK, thanks for the details. My interpretation seems to be
correct.

In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv
should
return IO_RETRY. Then,
xen/arch/arm/traps.c:do_trap_stage2_abort_guest
also needs to handle try_handle_mmio returning IO_RETRY the
first
around, and IO_HANDLED when after QEMU does its job.

What should do_trap_stage2_abort_guest do on IO_RETRY? Simply
return
early and let the scheduler do its job? Something like:

                enum io_state state = try_handle_mmio(regs, hsr,
gpa);

                switch ( state )
                {
                case IO_ABORT:
                    goto inject_abt;
                case IO_HANDLED:
                    advance_pc(regs, hsr);
                    return;
                case IO_RETRY:
                    /* finish later */
                    return;
                case IO_UNHANDLED:
                    /* IO unhandled, try another way to handle
it. */
                    break;
                default:
                    ASSERT_UNREACHABLE();
                }

Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
handle_hvm_io_completion() after QEMU completes the emulation.
Today,
handle_mmio just sets the user register with the read value.

But it would be better if it called again the original function
do_trap_stage2_abort_guest to actually retry the original
operation.
This time do_trap_stage2_abort_guest calls try_handle_mmio() and
gets
IO_HANDLED instead of IO_RETRY,
I may miss some important point, but I failed to see why
try_handle_mmio
(try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at
this
stage.
Or current try_fwd_ioserv() logic needs rework?
I think you should check the ioreq->state in try_fwd_ioserv(), if
the
result is ready, then ioreq->state should be STATE_IORESP_READY, and
you
can return IO_HANDLED.
I optimized test patch a bit (now it looks much simpler). I didn't face
any
issues during a quick test.
Both patches get much closer to following the proper state machine,
great! I think this patch is certainly a good improvement. I think the
other patch you sent earlier, slightly larger, is even better. It makes
the following additional changes that would be good to have:

- try_fwd_ioserv returns IO_HANDLED on state == STATE_IORESP_READY
- handle_mmio simply calls do_trap_stage2_abort_guest
I don't think we should call do_trap_stage2_abort_guest() as part of the
completion because:
     * The function do_trap_stage2_abort_guest() is using registers that are
not context switched (such as FAR_EL2). I/O handling is split in two with
likely a context switch in the middle. The second part is the completion
(i.e call to handle_mmio()). So the system registers will be incorrect.
     * A big chunk of do_trap_stage2_abort_guest() is not necessary for the
completion. For instance, there is no need to try to translate the guest
virtual address to a guest physical address.

Therefore the version below is probably the best approach.

Indeed, the first version (with calling do_trap_stage2_abort_guest() for a
completion) is a racy. When testing it more heavily I faced an issue
(sometimes) which resulted in DomU got stuck completely.

(XEN) d2v1: vGICD: bad read width 0 r11 offset 0x000f00

I didn't investigate an issue in detail, but I assumed that code in
do_trap_stage2_abort_guest() caused that. This was the main reason why I
decided to optimize an initial patch (and took only advance_pc).
Reading Julien's answer I understand now what could happen.
 From your and Julien's feedback it is clear that calling
do_trap_stage2_abort_guest() is not possible and not a good idea.


The reason for my suggestion was to complete the implementation of the
state machine so that "RETRY" actually means "let's try again the
emulation" but the second time it will return "HANDLED".

Looking at this again, we could achieve the same goal in a better way by
moving the register setting from "handle_mmio" to "try_handle_mmio" and
also calling "try_handle_mmio" from "handle_mmio". Note that handle_mmio
would become almost empty like on x86.

1) do_trap_stage2_abort_guest ->
        try_handle_mmio ->
             try_fwd_ioserv ->
                 IO_RETRY

2) handle_hvm_io_completion ->
        handle_mmio ->
            try_handle_mmio ->
                try_fwd_ioserv ->
                    IO_HANDLED


It is very similar to your second patch with a small change on calling
try_handle_mmio from handle_mmio and setting the register there. Do you
think that would work?
If I understood correctly what you had suggested and properly implemented then it works, at least I didn't face any issues during testing. I think this variant adds some extra operations comparing to the previous one (for example an attempt to find a mmio handler at try_handle_mmio). But, if you think new variant is cleaner and better represents how the state machine should look like, I would be absolutely OK to take this variant for non-RFC series. Please note, there was a request to move try_fwd_ioserv() to arm/ioreq.c (I am going to move new handle_ioserv() as well).


---
 xen/arch/arm/io.c    | 47 +++++++++++++++++++++++++++++++++++++++++++----
 xen/arch/arm/ioreq.c | 38 +++++++-------------------------------
 xen/arch/arm/traps.c |  4 +++-
 3 files changed, 53 insertions(+), 36 deletions(-)

diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index 436f669..4db7c55 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -109,6 +109,43 @@ static const struct mmio_handler *find_mmio_handler(struct domain *d,
 }

 #ifdef CONFIG_IOREQ_SERVER
+static enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v)
+{
+    const union hsr hsr = { .bits = regs->hsr };
+    const struct hsr_dabt dabt = hsr.dabt;
+    /* Code is similar to handle_read */
+    uint8_t size = (1 << dabt.size) * 8;
+    register_t r = v->arch.hvm.hvm_io.io_req.data;
+
+    /* We are done with the IO */
+    /* XXX: Is it the right place? */
+    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
+
+    /* XXX: Do we need to take care of write here ? */
+    if ( dabt.write )
+        return IO_HANDLED;
+
+    /*
+     * Sign extend if required.
+     * Note that we expect the read handler to have zeroed the bits
+     * outside the requested access size.
+     */
+    if ( dabt.sign && (r & (1UL << (size - 1))) )
+    {
+        /*
+         * We are relying on register_t using the same as
+         * an unsigned long in order to keep the 32-bit assembly
+         * code smaller.
+         */
+        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
+        r |= (~0UL) << size;
+    }
+
+    set_user_reg(regs, dabt.reg, r);
+
+    return IO_HANDLED;
+}
+
 static enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
                                     struct vcpu *v, mmio_info_t *info)
 {
@@ -130,6 +167,10 @@ static enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
     {
     case STATE_IOREQ_NONE:
         break;
+
+    case STATE_IORESP_READY:
+        return IO_HANDLED;
+
     default:
         printk("d%u wrong state %u\n", v->domain->domain_id,
                vio->io_req.state);
@@ -156,10 +197,6 @@ static enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
     else
         vio->io_completion = HVMIO_mmio_completion;

-    /* XXX: Decide what to do */
-    if ( rc == IO_RETRY )
-        rc = IO_HANDLED;
-
     return rc;
 }
 #endif
@@ -185,6 +222,8 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,

 #ifdef CONFIG_IOREQ_SERVER
         rc = try_fwd_ioserv(regs, v, &info);
+        if ( rc == IO_HANDLED )
+            return handle_ioserv(regs, v);
 #endif

         return rc;
diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
index 8f60c41..9068b8d 100644
--- a/xen/arch/arm/ioreq.c
+++ b/xen/arch/arm/ioreq.c
@@ -33,46 +33,22 @@
 #include <public/hvm/dm_op.h>
 #include <public/hvm/ioreq.h>

+#include <asm/traps.h>
+
 bool handle_mmio(void)
 {
     struct vcpu *v = current;
     struct cpu_user_regs *regs = guest_cpu_user_regs();
     const union hsr hsr = { .bits = regs->hsr };
-    const struct hsr_dabt dabt = hsr.dabt;
-    /* Code is similar to handle_read */
-    uint8_t size = (1 << dabt.size) * 8;
-    register_t r = v->arch.hvm.hvm_io.io_req.data;
-
-    /* We should only be here on Guest Data Abort */
-    ASSERT(dabt.ec == HSR_EC_DATA_ABORT_LOWER_EL);
+    paddr_t addr = v->arch.hvm.hvm_io.io_req.addr;

-    /* We are done with the IO */
-    /* XXX: Is it the right place? */
-    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
-
-    /* XXX: Do we need to take care of write here ? */
-    if ( dabt.write )
-        return true;
-
-    /*
-     * Sign extend if required.
-     * Note that we expect the read handler to have zeroed the bits
-     * outside the requested access size.
-     */
-    if ( dabt.sign && (r & (1UL << (size - 1))) )
+    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
     {
-        /*
-         * We are relying on register_t using the same as
-         * an unsigned long in order to keep the 32-bit assembly
-         * code smaller.
-         */
-        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
-        r |= (~0UL) << size;
+        advance_pc(regs, hsr);
+        return true;
     }

-    set_user_reg(regs, dabt.reg, r);
-
-    return true;
+    return false;
 }

 /* Ask ioemu mapcache to invalidate mappings. */
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index ea472d1..974c744 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1965,11 +1965,13 @@ static void do_trap_stage2_abort_guest(struct cpu_user_regs *regs,
             case IO_HANDLED:
                 advance_pc(regs, hsr);
                 return;
+            case IO_RETRY:
+                /* finish later */
+                return;
             case IO_UNHANDLED:
                 /* IO unhandled, try another way to handle it. */
                 break;
             default:
-                /* XXX: Handle IO_RETRY */
                 ASSERT_UNREACHABLE();
             }
         }
--
2.7.4



--
Regards,

Oleksandr Tyshchenko




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.