[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2 of 3] linux-2.6.18: xen/pv-on-hvm kexec: prevent crash in xenwatch_thread() when stale watch events arrive



>>> On 06.10.11 at 11:01, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> On Thu, 2011-10-06 at 09:54 +0100, Jan Beulich wrote:
>> >>> On 05.10.11 at 16:10, Olaf Hering <olaf@xxxxxxxxx> wrote:
>> > linux-2.6.18: xen/pv-on-hvm kexec: prevent crash in xenwatch_thread() when 
>> > stale watch events arrive
>> > 
>> > commit c4c303c7c5679b4b368e12f41124aee29c325b76
>> > 
>> > During repeated kexec boots xenwatch_thread() can crash because
>> > xenbus_watch->callback is cleared by xenbus_watch_path() if a node/token
>> > combo for a new watch happens to match an already registered watch from
>> > an old kernel.  In this case xs_watch returns -EEXISTS, then
>> > register_xenbus_watch() does not remove the to-be-registered watch from
>> > the list of active watches but returns the -EEXISTS to the caller
>> > anyway.
>> > 
>> > Because the watch is still active in xenstored it will cause an event
>> > which will arrive in the new kernel. process_msg() will find the
>> > encapsulated struct xenbus_watch in its list of registered watches and
>> > puts the "empty" watch handle in the queue for xenwatch_thread().
>> > xenwatch_thread() then calls ->callback which was cleared earlier by
>> > xenbus_watch_path().
>> > 
>> > To prevent that crash in a guest running on an old xen toolstack remove
>> > the special -EEXIST handling.
>> > 
>> > v2:
>> >  - remove the EEXIST handing in register_xenbus_watch() instead of
>> >    checking for ->callback in process_msg()
>> > 
>> > Signed-off-by: Olaf Hering <olaf@xxxxxxxxx>
>> > 
>> > diff -r 86e85596d64b -r 94943cf14303 drivers/xen/xenbus/xenbus_xs.c
>> > --- a/drivers/xen/xenbus/xenbus_xs.c
>> > +++ b/drivers/xen/xenbus/xenbus_xs.c
>> > @@ -656,8 +656,7 @@ int register_xenbus_watch(struct xenbus_
>> >  
>> >    err = xs_watch(watch->node, token);
>> >  
>> > -  /* Ignore errors due to multiple registration. */
>> > -  if ((err != 0) && (err != -EEXIST)) {
>> > +  if (err) {
>> 
>> While I committed the other two patches in this series, this one seems
>> to have the potential for regressions (the comment and the checking for
>> -EEXIST can be assumed to have been there for a reason - whether
>> they became stale by now is not obvious),
> 
> Keir said earlier it wasn't correct:
> http://marc.info/?l=xen-devel&m=131358786516831&w=2 

Quoting him: "Either remove the EEXIST check, or convert EEXIST to
return code 0 in register_xenbus_watch(). You could do either, since
I'm sure I added the EEXIST check only as an attempt to theoretically
robustify that function, and looks like I got it wrong."

Removing the check (as Olaf did) doesn't deal with the xenbus device
case mentioned above. Converting it to zero would make sense (as
the watch asked for is actually registered after the function returns),
but would get Olaf's problem addressed afaict.

But wait - the xenbus device case is different because the watch
structure gets allocated, so if duplicate detection is really based on
the address of the watch structure (i.e. the token produced from
the address), this wouldn't be an issue then.

Further I only now notice that there's a list_add() of the watch
structure prior to the call to xs_watch() - if a watch was registered
twice, this would lead to a corrupted list.

So with that it looks like it's indeed pointless to handle the -EEXIST
case separately.

Thanks, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.