Xen project Mailing List

Re: [PATCH v2] xen/xenbus: make xs_talkv() interruptible

To: Jürgen Groß <jgross@xxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>, <linux-kernel@xxxxxxxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Thu, 17 Dec 2020 18:25:52 +0000

Authentication-results: esa2.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none

Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>

Delivery-date: Thu, 17 Dec 2020 18:26:10 +0000

Ironport-sdr: 00tXMVKz+bqsvoJATU8UhkG0eBARLmv5IpviZufx/bZc56Toigz1exXkaq/60al/HMtc0VNN9g k1QKMN0UehcqDT62F4WnbUjWK+u/kwbUPTkrS1h/l3YEhMC+8o7G9iRsiXIsYI9DydJJGpDWyd QSC5FwiSno+b9sn3UcJbmC9EQ1kdZMvZmZZJh+jJ/YWqaqU9+gLhru6eCHDx8A6/NMalhCeFkQ CwWXMpJnNZqRs2epeeF571ULEE5+T1kMsOrBOWiF3YoWDij5uJKUfkJ71GQKwF5sOV5fwk9E13 3Zs=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 16/12/2020 08:21, Jürgen Groß wrote: > On 15.12.20 21:59, Andrew Cooper wrote: >> On 15/12/2020 11:10, Juergen Gross wrote: >>> In case a process waits for any Xenstore action in the xenbus driver >>> it should be interruptible by signals. >>> >>> Signed-off-by: Juergen Gross <jgross@xxxxxxxx> >>> --- >>> V2: >>> - don't special case SIGKILL as libxenstore is handling -EINTR fine >>> --- >>> drivers/xen/xenbus/xenbus_xs.c | 9 ++++++++- >>> 1 file changed, 8 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/xen/xenbus/xenbus_xs.c >>> b/drivers/xen/xenbus/xenbus_xs.c >>> index 3a06eb699f33..17c8f8a155fd 100644 >>> --- a/drivers/xen/xenbus/xenbus_xs.c >>> +++ b/drivers/xen/xenbus/xenbus_xs.c >>> @@ -205,8 +205,15 @@ static bool test_reply(struct xb_req_data *req) >>> static void *read_reply(struct xb_req_data *req) >>> { >>> + int ret; >>> + >>> do { >>> - wait_event(req->wq, test_reply(req)); >>> + ret = wait_event_interruptible(req->wq, test_reply(req)); >>> + >>> + if (ret == -ERESTARTSYS && signal_pending(current)) { >>> + req->msg.type = XS_ERROR; >>> + return ERR_PTR(-EINTR); >>> + } >> >> So now I can talk fully about the situations which lead to this, I think >> there is a bit more complexity. >> >> It turns out there are a number of issues related to running a Xen >> system with no xenstored. >> >> 1) If a xenstore-write occurs during startup before init-xenstore-domain >> runs, the former blocks on /dev/xen/xenbus waiting for xenstored to >> reply, while the latter blocks on /dev/xen/xenbus_backend when trying to >> tell the dom0 kernel that xenstored is in dom1. This effectively >> deadlocks the system. > > This should be easy to solve: any request to /dev/xen/xenbus should > block upfront in case xenstored isn't up yet (could e.g. wait > interruptible until xenstored_ready is non-zero). I'm not sure that that would fix the problem. The problem is that setting the ring details via /dev/xen/xenbus_backend blocks, which prevents us launching the xenstored stubdomain, which prevents the earlier xenbus write being completed. So long as /dev/xen/xenbus_backend doesn't block, there's no problem with other /dev/xen/xenbus activity being pending briefly. Looking at the current logic, I'm not completely convinced. Even finding a filled-in evtchn/gfn doesn't mean that xenstored is actually ready. There are 3 possible cases. 1) PV guest, and details in start_info 2) HVM guest, and details in HVM_PARAMs 3) No details (expected for dom0). Something in userspace must provide details at a later point. So the setup phases go from nothing, to having ring details, to finding the ring working. I think it would be prudent to try reading a key between having details and declaring the xenstored_ready. Any activity, even XS_ERROR, indicates that the other end of the ring is listening. > >> 2) If xenstore-watch is running when xenstored dies, it spins at 100% >> cpu usage making no system calls at all. This is caused by bad error >> handling from xs_watch(), and attempting to debug found: > > Can you expand on "bad error handling from xs_watch()", please? do_watch() has for ( ... ) { // defaults to an infinite loop vec = xs_read_watch(); if (vec == NULL) continue; ... } My next plan was to experiment with break instead of continue, which I'll get to at some point. > >> >> 3) (this issue). If anyone starts xenstore-watch with no xenstored >> running at all, it blocks in D in the kernel. > > Should be handled with solution for 1). > >> >> The cause is the special handling for watch/unwatch commands which, >> instead of just queuing up the data for xenstore, explicitly waits for >> an OK for registering the watch. This causes a write() system call to >> block waiting for a non-existent entity to reply. >> >> So while this patch does resolve the major usability issue I found (I >> can't even SIGINT and get my terminal back), I think there are issues. >> >> The reason why XS_WATCH/XS_UNWATCH are special cased is because they do >> require special handling. The main kernel thread for processing >> incoming data from xenstored does need to know how to associate each >> async XS_WATCH_EVENT to the caller who watched the path. >> >> Therefore, depending on when this cancellation hits, we might be in any >> of the following states: >> >> 1) the watch is queued in the kernel, but not even sent to xenstored yet >> 2) the watch is queued in the xenstored ring, but not acted upon >> 3) the watch is queued in the xenstored ring, and the xenstored has seen >> it but not replied yet >> 4) the watch has been processed, but the XS_WATCH reply hasn't been >> received yet >> 5) the watch has been processed, and the XS_WATCH reply received >> >> State 5 (and a little bit) is the normal success path when xenstored has >> acted upon the request, and the internal kernel infrastructure is set up >> appropriately to handle XS_WATCH_EVENTs. >> >> States 1 and 2 can be very common if there is no xenstored (or at least, >> it hasn't started up yet). In reality, there is either no xenstored, or >> it is up and running (and for a period of time during system startup, >> these cases occur in sequence). > > Yes. this is the reason we can't just reject a user request if xenstored > hasn't been detected yet: it could be just starting. Right, and I'm not suggesting that we'd want to reject accesses while xenstored is starting up. > >> >> As soon as the XS_WATCH event has been written into the xenstored ring, >> it is not safe to cancel. You've committed to xenstored processing the >> request (if it is up). > > I'm not sure this is true. Cancelling it might result in a stale watch > in xenstored, but there shouldn't be a problem related to that. In case > that watch fires the event will normally be discarded by the kernel as > no matching watch is found in the kernel's data. In case a new watch > has been setup with the same struct xenbus_watch address (which is used > as the token), then this new watch might fire without the node of the > new watch having changed, but spurious watch events are defined to be > okay (OTOH the path in the event might look strange to the handler). Watches are a quota'd resource in (at least some) xenstored configurations. Losing track of the registration is a resource leak, even if the kernel can filter and discard the unexpected watch events. >> If xenstored is actually up and running, its fine and necessary to >> block. The request will be processed in due course (timing subject to >> the client and server load). If xenstored isn't up, blocking isn't ok. >> >> Therefore, I think we need to distinguish "not yet on the ring" from "on >> the ring", as our distinction as to whether cancelling is safe, and >> ensure we don't queue anything on the ring before we're sure xenstored >> has started up. >> >> Does this make sense? > > Basically, yes. Great. If I get any time, I'll try to look into some fixes along the above lines. ~Andrew

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.