Xen project Mailing List

RE: [PATCH] Re: [Xen-devel] Xen crash on dom0 shutdown

To: 'Keir Fraser' <keir.fraser@xxxxxxxxxxxxx>, 'Jan Beulich' <jbeulich@xxxxxxxxxx>

From: "Shan, Haitao" <haitao.shan@xxxxxxxxx>

Date: Thu, 25 Sep 2008 20:42:46 +0800

Accept-language: en-US

Acceptlanguage: en-US

Cc: "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>, "'xen-devel@xxxxxxxxxxxxxxxxxxx'" <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 25 Sep 2008 05:43:19 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AckeJb2U/BQsZIoYEd2egwAX8io7RQAFVrwGADNJP7A=

Thread-topic: [PATCH] Re: [Xen-devel] Xen crash on dom0 shutdown

Hi, Keir, As I tested my patch today, I find cs18539 breaks my system. I have to revert this changeset to test MSI. Here is the output from serial port. (XEN) ----[ Xen-3.4-unstable x86_64 debug=n Not tainted ]---- (XEN) CPU: 2 (XEN) RIP: e008:[<ffff828c8013c2f9>] pirq_guest_unbind+0xb9/0x2f0 (XEN) RFLAGS: 0000000000010082 CONTEXT: hypervisor (XEN) rax: ffff828c80274d80 rbx: 0000000000000000 rcx: 00000000000000d0 (XEN) rdx: ffff83007c60fe38 rsi: 00000000000000ff rdi: ffff83007c80e080 (XEN) rbp: ffff83007c80e080 rsp: ffff83007c60fe28 r8: 0000000000000282 (XEN) r9: ffff828c80274da4 r10: ffff828c8026e580 r11: 0000000000000206 (XEN) r12: ffff828c80274d80 r13: 00000000000000d0 r14: 00000000000000ff (XEN) r15: 00000000000000ff cr0: 000000008005003b cr4: 00000000000026b0 (XEN) cr3: 000000005ea9c000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff83007c60fe28: (XEN) 0000000000000282 ffff83007c80e080 0000000000000282 fffffffffffffffd (XEN) ffff83007c80e080 00000000000000ff 00000000000000d0 ffff8800204408b0 (XEN) 00000000000000ff ffff828c80149075 00000008000000ff ffffffff803908d6 (XEN) 828c801a98a0b948 c390ec90d1ffffff ffff83007c80e1f8 ffff83007c60fed8 (XEN) ffff828c8025f900 ffff83007c80d080 000000ff804e7ff0 ffffffffffffffff (XEN) 0000000000000000 0000000000000000 000000010000001a 00a0fb000000001a (XEN) 00000001801fe568 ffff83007d5e6080 00000000000000ff ffff8800204408b0 (XEN) 0000000000000000 ffff8800204408b0 ffff880020440000 ffff828c801a9169 The actually call trace is do_physdev_op->pirq_guest_unbind, which definitely is in unmap_pirq hypercall for MSI interrupt. This will happen when dom0 first unbind guest pirq and then unmap that pirq. I think this little patch will fix the problem. But I am not quite sure whether this will break the intension of cs 18539. diff -r 7a32c2325fdc xen/arch/x86/irq.c --- a/xen/arch/x86/irq.c Thu Sep 25 10:26:08 2008 +0100 +++ b/xen/arch/x86/irq.c Thu Sep 25 21:12:03 2008 +0800 @@ -619,6 +619,8 @@ } action = (irq_guest_action_t *)desc->action; + if ( !action ) + goto out; vector = desc - irq_desc; for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) Signed-off-by: Shan haitao <haitao.shan@xxxxxxxxx> Best Regards Haitao Shan Keir Fraser wrote: > On 24/9/08 10:13, "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx> wrote: > >>> How about this one? It doesn't exactly follow the path you >>> suggested, i.e. doesn't mess with event channels, but rather marks >>> the irq<->vector mapping as invalid, allowing only a subsequent >>> event channel unbind (i.e. close) to recover from that state (which >>> seems better in terms of requiring proper discipline in the guest, >>> as it prevents re-using the irq or vector without properly cleaning >>> up). >> >> Yeah, this is the kind of thing I had in mind. I will work on this a >> bit more (e.g., need to synchronise on d->evtchn_lock to avoid racing >> EVTCHNOP_bind_pirq; also I'm afraid about leaking MSI vectors on >> domain destruction). Thanks. > > Changeset 18539. I made the locking quite a bit stricter, by getting > rid of 'irq_lock' and instead using 'evtchn_lock' (which may be > better renamed now to 'event_lock' since it isn't protecting just > event channels now). Now the pirq-to-vector mapping is protected by > both evtchn_lock and irq_desc->lock. A user of the mapping can > protect themselves with either lock (whichever is most convenient). > > Some TODOs: > * There is no management of MSI vectors. They always leak! I didn't > fix that here since it isn't a mere synchronisation race; the code > just isn't there. > * I decided to WARN_ON(!spin_is_locked(&d->evtchn_lock)) in > pirq_guest_[un]bind(). The reason is that in any case those functions > do not expect to be re-entered -- they really want to be per-domain > serialised. Furthermore I am pretty certain that the HVM passthrough > code is not synchronising properly with changes to the pirq-to-vector > mapping (it uses domain_irq_to_vector() in many places with no care > for locking) nor is it synchronised with other users of > pirq_guest_bind() --- a reasonable semantics here would be that a > domain pirq can be bound to once, either via an event channel, or > through a virtual PIC in HVM emulation context. I therefore think > that careful locking is required -- it may suffice to get rid of (or > at least make less use of) the hvm_domain.irq_lock and replace its > use with evtchn_lock (only consideration is that the latter is not > IRQ-safe). The WARN_ON() is a nice reminder of work to be done here. > ;-) > > Comments? > > -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.