[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: IOREQ completions for MMIO writes
On 02/09/2024 10:28 am, Jan Beulich wrote: > On 29.08.2024 19:31, Andrew Cooper wrote: >> On 29/08/2024 5:08 pm, Jason Andryuk wrote: >>> Hi Everyone, >>> >>> I've been looking at ioreq latency and pausing of vCPUs. Specifically >>> for MMIO (IOREQ_TYPE_COPY) writes, they still need completions: >>> >>> static inline bool ioreq_needs_completion(const ioreq_t *ioreq) >>> { >>> return ioreq->state == STATE_IOREQ_READY && >>> !ioreq->data_is_ptr && >>> (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE); >>> } >>> >>> state == STATE_IOREQ_READY >>> data_is_ptr == 0 >>> type == IOREQ_TYPE_COPY >>> dir == IOREQ_WRITE >>> >>> To a completion is needed. The vCPU remains paused with >>> _VPF_blocked_in_xen set in paused_flags until the ioreq server >>> notifies of the completion. >>> >>> At least for the case I'm looking, a single write to a mmio register, >>> it doesn't seem like the vCPU needs to be blocked. The write has been >>> sent and subsequent emulation should not depend on it. >>> >>> I feel like I am missing something, but I can't think of a specific >>> example where a write needs to be blocking. Maybe it simplifies the >>> implementation, so a subsequent instruction will always have a ioreq >>> slot available? >>> >>> Any insights are appreciated. >> >> This is a thorny issue. >> >> In x86, MMIO writes are typically posted, but that doesn't mean that the >> underlying layers can stop tracking the write completely. >> >> In your scenario, consider what happens when the same vCPU hits a second >> MMIO write a few instructions later. You've now got two IOREQs worth of >> pending state, only one slot in the "ring", and a wait of an unknown >> period of time for qemu to process the first. >> >> >> More generally, by not blocking you're violating memory ordering. >> >> Consider vCPU0 doing an MMIO write, and vCPU1 doing an MMIO read, and >> qemu happening to process vCPU1 first. >> >> You now have a case where the VM can observe vCPU0 "completing" before >> vCPU1 starts, yet vCPU1 observing the old value. >> >> Other scenarios which exist would be e.g. a subsequent IO hitting STDVGA >> buffering and being put into the bufioreq ring. Or the vCPU being able >> to continue when the "please unplug my emulated disk/network" request is >> still pending. > Or, in generalized terms, any writes having side effects. > >> In terms of what to do about latency, this is one area where Xen does >> suffer vs KVM. >> >> With KVM, this type of emulation is handled synchronously by an entity >> on the same logical processor. With Xen, one LP says "I'm now blocked, >> schedule something else" without any idea when the IO will even be >> processed. >> >> >> One crazy idea I had was to look into not de-scheduling the HVM vCPU, >> and instead going idle by MONITOR-ing the IOREQ slot. >> >> This way, Qemu can "resume" the HVM vCPU by simply writing the >> completion status (and observing some kind of new "I don't need an >> evtchn" signal). For a sufficiently quick turnaround, you're also not >> thrashing the cache by scheduling another vCPU in the meantime. >> >> It's definitely more complicated. For one, you'd need to double the >> size of an IOREQ slot (currently 32 bytes) to avoid sharing a cacheline >> with an adjacent vCPU. > Iirc we talked about moving to a full page per vCPU anyway, back in Prague. > > As to more complicated - I'd be curious how you would mean to avoid abuse. > Even without considering abuse attempts, qemu becoming de-scheduled would > already look to be problematic as to holding up an MWAITing pCPU for too > long. Some sensible heuristic towards some form of timeout would likely be > difficult to come up with (both helping performance and avoiding issues). Well - the scheduler tick will force the CPU out of MWAIT. It's not conceptually different to a vCPU which used its entire timeslice. But yes, it is not necessarily the most ideal behaviour, but does depend on circumstances. e.g. if you're not oversubscribing vCPUs to pCPUs to begin with, then going to sleep for any length of time is ok. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |