Xen project Mailing List

Re: IOREQ completions for MMIO writes

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jason Andryuk <jason.andryuk@xxxxxxx>

Date: Mon, 2 Sep 2024 11:28:49 +0200

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Mon, 02 Sep 2024 09:28:52 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 29.08.2024 19:31, Andrew Cooper wrote: > On 29/08/2024 5:08 pm, Jason Andryuk wrote: >> Hi Everyone, >> >> I've been looking at ioreq latency and pausing of vCPUs. Specifically >> for MMIO (IOREQ_TYPE_COPY) writes, they still need completions: >> >> static inline bool ioreq_needs_completion(const ioreq_t *ioreq) >> { >> return ioreq->state == STATE_IOREQ_READY && >> !ioreq->data_is_ptr && >> (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE); >> } >> >> state == STATE_IOREQ_READY >> data_is_ptr == 0 >> type == IOREQ_TYPE_COPY >> dir == IOREQ_WRITE >> >> To a completion is needed. The vCPU remains paused with >> _VPF_blocked_in_xen set in paused_flags until the ioreq server >> notifies of the completion. >> >> At least for the case I'm looking, a single write to a mmio register, >> it doesn't seem like the vCPU needs to be blocked. The write has been >> sent and subsequent emulation should not depend on it. >> >> I feel like I am missing something, but I can't think of a specific >> example where a write needs to be blocking. Maybe it simplifies the >> implementation, so a subsequent instruction will always have a ioreq >> slot available? >> >> Any insights are appreciated. > > > This is a thorny issue. > > In x86, MMIO writes are typically posted, but that doesn't mean that the > underlying layers can stop tracking the write completely. > > In your scenario, consider what happens when the same vCPU hits a second > MMIO write a few instructions later. You've now got two IOREQs worth of > pending state, only one slot in the "ring", and a wait of an unknown > period of time for qemu to process the first. > > > More generally, by not blocking you're violating memory ordering. > > Consider vCPU0 doing an MMIO write, and vCPU1 doing an MMIO read, and > qemu happening to process vCPU1 first. > > You now have a case where the VM can observe vCPU0 "completing" before > vCPU1 starts, yet vCPU1 observing the old value. > > Other scenarios which exist would be e.g. a subsequent IO hitting STDVGA > buffering and being put into the bufioreq ring. Or the vCPU being able > to continue when the "please unplug my emulated disk/network" request is > still pending. Or, in generalized terms, any writes having side effects. > In terms of what to do about latency, this is one area where Xen does > suffer vs KVM. > > With KVM, this type of emulation is handled synchronously by an entity > on the same logical processor. With Xen, one LP says "I'm now blocked, > schedule something else" without any idea when the IO will even be > processed. > > > One crazy idea I had was to look into not de-scheduling the HVM vCPU, > and instead going idle by MONITOR-ing the IOREQ slot. > > This way, Qemu can "resume" the HVM vCPU by simply writing the > completion status (and observing some kind of new "I don't need an > evtchn" signal). For a sufficiently quick turnaround, you're also not > thrashing the cache by scheduling another vCPU in the meantime. > > It's definitely more complicated. For one, you'd need to double the > size of an IOREQ slot (currently 32 bytes) to avoid sharing a cacheline > with an adjacent vCPU. Iirc we talked about moving to a full page per vCPU anyway, back in Prague. As to more complicated - I'd be curious how you would mean to avoid abuse. Even without considering abuse attempts, qemu becoming de-scheduled would already look to be problematic as to holding up an MWAITing pCPU for too long. Some sensible heuristic towards some form of timeout would likely be difficult to come up with (both helping performance and avoiding issues). Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.