[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] bug in hvmloader xenbus_shutdown logic
Hi, Sorry for the delay replying! [added Paul Durrant to cc:] On 6 Dec 2013, at 03:53, Liuqiming (John) <john.liuqiming@xxxxxxxxxx> wrote: > How about we add a XS_RESET operation to xenstore and the semantic should be: > reset the connection to initial state > > The work flow asï > 1, Hvmloader send this request to oxenstored and then poll the event > 2,Oxenstored clear up IO ring and do some other work(if any) to make sure > this connect reset to initial state > 3,Oxenstored send a event to notify the reset work has finished > > I think oxenstored should be the "owner" of this IO ring, so all the > complicated operation should be done by oxenstord and other component should > just send the request. I agree that (o)xenstored should be the âownerâ of this ring and other components should just send requests. Perhaps instead of sending a new type of request inside the ring, we could use some of the free space in the shared page to signal a reconnect? The current definition is: /* * `incontents 150 xenstore_struct XenStore wire protocol. * * Inter-domain shared memory communications. */ #define XENSTORE_RING_SIZE 1024 typedef uint32_t XENSTORE_RING_IDX; #define MASK_XENSTORE_IDX(idx) ((idx) & (XENSTORE_RING_SIZE-1)) struct xenstore_domain_interface { char req[XENSTORE_RING_SIZE]; /* Requests to xenstore daemon. */ char rsp[XENSTORE_RING_SIZE]; /* Replies and async watch events. */ XENSTORE_RING_IDX req_cons, req_prod; XENSTORE_RING_IDX rsp_cons, rsp_prod; }; which means that 1024 + 1024 + 4 * 4 = 2064 bytes are used, and 2032 bytes are still free. I believe the 4k page is allocated at domain build time and the unused bytes are guaranteed to be 0 (â is this correct?) Perhaps we could allocate 12 bytes for a version number and another pair of counters: uint32_t ring_protocol_version; uint32_t connection_request; uint32_t connection_actual; A new xenstored could set âring_protocol_versionâ to 1, old xenstoreds would leave it at 0. If a new xenstore client saw the ring_protocol_version = 1, it would be able to request a reconnect by incrementing âconnection_requestâ and waiting xenstored to zero the ring and increment âconnection_actualâ. We could then modify hvmloader to safely reset the ring before the guest boots. If a guest wants to jump into a crash kernel (with undefined xenstore ring state) it would also be able to request a reconnect and load PV drivers. What do you think? Cheers, Dave > >> -----Original Message----- >> From: Ian Campbell [mailto:Ian.Campbell@xxxxxxxxxx] >> Sent: Thursday, December 05, 2013 7:11 PM >> To: David Scott >> Cc: Liuqiming (John); Yanqiangjun; xen-devel@xxxxxxxxxxxxx >> Subject: Re: [Xen-devel] bug in hvmloader xenbus_shutdown logic >> >> On Thu, 2013-12-05 at 09:36 +0000, David Scott wrote: >>> On 05/12/13 02:08, Liuqiming (John) wrote: >>>> According to oxenstored source code, oxenstored will read every >> domain's IO ring no matter what events happened. >>>> >>>> Here is the main loop of oxenstored: >>>> >>>> let main_loop () = >>> ... >>>> process_domains store cons domains <- no matter what event >> income, this will handle IO ring request of all domains >>>> in >>>> >>>> so when one domain's hvmloader is clearing its IO ring, oxenstored may >> access this IO ring because of another domain's event happened. >>> >>> Yes, this version of the code checks all the interdomain rings every >>> time it goes around the main loop. FWIW my experimental updated >>> oxenstore uses one (user-level) thread per connection. I had thought >>> this was just an efficiency issue... I didn't realise it had correctness >>> implications. >> >> TBH the protocol probably doesn't say anything about not looking unless >> you've got a event, so I guess it is kind of implicit and I suppose it >> could be argued that what you do is fine (except it's broken in >> practice ;-)) >> >> Is there some well defined order in which we could clear the various >> pointer fields safely? I'd need to get my ring macro head on to figure >> that out I think. >> >>>>> On Wed, 2013-12-04 at 04:25 +0000, Liuqiming (John) wrote: >>>>> >>>>>> memset can not set all the page to zero in an atomic way, and during >>>>>> the clear up process oxenstored may access this ring. >>>>> >>>>> Why is oxenstored poking at the ring? Surely it should only do so when >>>>> the guest (hvmloader) sends it a request. If hvmloader is clearing the >>>>> page while there is a request/event outstanding then this is an >>>>> hvmloader bug. >>> >>> Ok, that makes sense. >>> >>> My only hesitation is that violations of this rule (like with current >>> oxenstored) show up rarely. Presumably the xenstore ring desynchronises >>> and the guest will never be able to boot properly with PV drivers. I >>> don't think I've ever seen this myself. >>> >>> Do frontends expect the xenstore ring to be in a zeroed state when they >>> startup? Assuming hvmloader has received all the outstanding >>> request/events, would it be safe to leave the ring contents alone? >> >> Most frontends also have private versions of various pointers which >> would need to be synchronised, and I suspect all of the init to zero. >> >> Ian. > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |