[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: dom0less vs xenstored setup race Was: xen | Failed pipeline for staging | 6a47ba2f
On Wed, 3 May 2023, Julien Grall wrote: > On 03/05/2023 15:38, andrew.cooper3@xxxxxxxxxx wrote: > > Hello, > > > > After what seems like an unreasonable amount of debugging, we've tracked > > down exactly what is going wrong here. > > > > https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/4219721944 > > > > Of note is the smoke.serial log around: > > > > io: IN 0xffff90fec250 d0 20230503 14:20:42 INTRODUCE (1 233473 1 ) > > obj: CREATE connection 0xffff90fff1f0 > > *** d1 CONN RESET req_cons 00000000, req_prod 0000003a rsp_cons > > 00000000, rsp_prod 00000000 > > io: OUT 0xffff9105cef0 d0 20230503 14:20:42 WATCH_EVENT > > (@introduceDomain domlist ) > > > > XS_INTRODUCE (in C xenstored at least, not checked O yet) always > > clobbers the ring pointers. The added pressure on dom0 that the > > xensconsoled adds with it's 4M hypercall bounce buffer occasionally > > defers xenstored long enough that the XS_INTRODUCE clobbers the first > > message that dom1 wrote into the ring. > > > > The other behaviour seen was xenstored observing a header looking like this: > > > > *** d1 HDR { ty 0x746e6f63, rqid 0x2f6c6f72, txid 0x74616c70, len > > 0x6d726f66 } > > > > which was rejected as being too long. That's "control/platform" in > > ASCII, so the XS_INTRODUCE intersected dom1 between writing the header > > and writing the payload. > > > > > > Anyway, it is buggy for XS_INTRODUCE to be called on a live an > > unsuspecting connection. It is ultimately init-dom0less's fault for > > telling dom1 it's good to go before having waited for XS_INTRODUCE to > > complete. > > So the problem is xenstored will set interface->connection to > XENSTORE_CONNECTED before finalizing the connection. Caqn you try the > following, for now, very hackish patch: > > diff --git a/tools/xenstore/xenstored_domain.c > b/tools/xenstore/xenstored_domain.c > index f62be2245c42..bbf85bbbea3b 100644 > --- a/tools/xenstore/xenstored_domain.c > +++ b/tools/xenstore/xenstored_domain.c > @@ -688,6 +688,7 @@ static struct domain *introduce_domain(const void *ctx, > talloc_steal(domain->conn, domain); > > if (!restore) { > + domain_conn_reset(domain); > /* Notify the domain that xenstore is available */ > interface->connection = XENSTORE_CONNECTED; > xenevtchn_notify(xce_handle, domain->port); > @@ -730,8 +731,6 @@ int do_introduce(const void *ctx, struct connection *conn, > if (!domain) > return errno; > > - domain_conn_reset(domain); > - > send_ack(conn, XS_INTRODUCE); Following Jurgen's suggestion, I made this slightly modified version of the patch. With it, the problem is solved: https://gitlab.com/xen-project/people/sstabellini/xen/-/pipelines/856450703 diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c index f62be2245c..5ca160d9f2 100644 --- a/tools/xenstore/xenstored_domain.c +++ b/tools/xenstore/xenstored_domain.c @@ -688,6 +688,8 @@ static struct domain *introduce_domain(const void *ctx, talloc_steal(domain->conn, domain); if (!restore) { + if (!is_master_domain) + domain_conn_reset(domain); /* Notify the domain that xenstore is available */ interface->connection = XENSTORE_CONNECTED; xenevtchn_notify(xce_handle, domain->port); @@ -730,8 +732,6 @@ int do_introduce(const void *ctx, struct connection *conn, if (!domain) return errno; - domain_conn_reset(domain); - send_ack(conn, XS_INTRODUCE); return 0;
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |