[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] evtchn: clean last_vcpu_id on EVTCHNOP_reset to avoid crash
On 08/08/14 16:17, Vitaly Kuznetsov wrote: > David Vrabel <david.vrabel@xxxxxxxxxx> writes: > >> On 08/08/14 15:22, Vitaly Kuznetsov wrote: >>> When EVTCHNOP_reset is being performed last_vcpu_id attribute is not being >>> cleaned by __evtchn_close(). In case last_vcpu_id != 0 for a particular >>> event channel and this event channel is going to be used for event delivery >>> (for another vcpu) before EVTCHNOP_init_control for vcpu == last_vcpu_id >>> was done the following crash is observed: >>> >>> ... >>> (XEN) Xen call trace: >>> (XEN) [<ffff82d080127785>] _spin_lock_irqsave+0x5/0x70 >>> (XEN) [<ffff82d0801097db>] evtchn_fifo_set_pending+0xdb/0x370 >>> (XEN) [<ffff82d080107146>] evtchn_send+0xd6/0x160 >>> (XEN) [<ffff82d080107df9>] do_event_channel_op+0x6a9/0x16c0 >>> (XEN) [<ffff82d0801ce800>] vmx_intr_assist+0x30/0x480 >>> (XEN) [<ffff82d080219e99>] syscall_enter+0xa9/0xae >>> >>> This happens because lock_old_queue() does not check VCPU's control >>> block existence and after EVTCHNOP_reset they are all cleaned. >>> >>> I suggest we fix the issue twice: reset last_vcpu_id to 0 in >>> __evtchn_close() >>> and add appropriate check to lock_old_queue() as lost event is much better >>> than hypervisor crash. >>> >>> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> >>> --- >>> xen/common/event_channel.c | 3 +++ >>> xen/common/event_fifo.c | 9 +++++++++ >>> 2 files changed, 12 insertions(+) >>> >>> diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c >>> index a7becae..67b9d53 100644 >>> --- a/xen/common/event_channel.c >>> +++ b/xen/common/event_channel.c >>> @@ -578,6 +578,9 @@ static long __evtchn_close(struct domain *d1, int port1) >>> chn1->state = ECS_FREE; >>> chn1->notify_vcpu_id = 0; >>> >>> + /* Reset last_vcpu_id to vcpu0 as control block can be freed */ >>> + chn1->last_vcpu_id = 0; >> >> This is broken if the event channel is closed and rebound while the >> event is linked. >> >> You can only safely clear chn->last_vcpu_id during evtchn_fifo_destroy(). >> >> You also need to clear last_priority. >> > > Thanks, alternatively I can do that in evtchn_reset() after > evtchn_fifo_destroy() as it is the only path leading to the issue. I > wanted to avoid that to exclude additional loop for all event channels. > >>> + >>> xsm_evtchn_close_post(chn1); >>> >>> out: >>> diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c >>> index 51b4ff6..e4bef80 100644 >>> --- a/xen/common/event_fifo.c >>> +++ b/xen/common/event_fifo.c >>> @@ -61,6 +61,15 @@ static struct evtchn_fifo_queue *lock_old_queue(const >>> struct domain *d, >>> for ( try = 0; try < 3; try++ ) >>> { >>> v = d->vcpu[evtchn->last_vcpu_id]; >>> + >>> + if ( !v->evtchn_fifo ) >>> + { >>> + gdprintk(XENLOG_ERR, >>> + "domain %d vcpu %d has no control block!\n", >>> + d->domain_id, v->vcpu_id); >>> + return NULL; >>> + } >> >> I think this check needs to be in evtchn_fifo_init() to prevent the >> event from being bound to VCPU that does not have a control block. >> > > I *think* it is not the issue here - the event is being bound to VCPU > with this block initialized. But last_vcpu_id for this particular event > channel points to some other VCPU which has not initialized its control > block yet (so d->vcpu[evtchn->last_vcpu_id]->evtchn_fifo is NULL). There > is no path to get in such situation (after we clear last_vcpu_id), I > just wanted to put reasonable message here in case something will change > in future. Then evtchn_fifo_init() needs to check both the new VCPU and last_vcpu_id have control blocks. I much prefer failing the bind up front than detecting the problem later. David _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |