[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Questions about GPLPV stability tests
On Thu, Dec 8, 2011 at 11:46 PM, Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx> wrote: > On Wed, Dec 07, 2011 at 08:44:19PM +0000, Roderick Colenbrander wrote: >> On Wed, Dec 7, 2011 at 8:38 PM, Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx> >> wrote: >> >> It took about a week, but the systems went down again. Linux is down, >> >> but the hypervisor is still reachable on the serial console. Is there >> >> anything interesting to dump from there? >> >> >> > Just the interrupts information. I think that is Ctrl-A, Ctrl-A, >> > Ctrl-A, and then '*' to capture everything (I don't remember the right >> > one for just interrupts). >> >> Here's the interrupt information: >> (XEN) [2011-12-06 19:13:37] [i: dump interrupt bindings] >> (XEN) [2011-12-06 19:13:37] Guest interrupt information: >> (XEN) [2011-12-06 19:13:37] IRQ: 0 >> affinity:00000000,00000000,00000000,00000001 vec:f0 type=IO-APIC-edge >> status=00000000 mapped, unbound >> (XEN) [2011-12-06 19:13:37] IRQ: 1 >> affinity:00000000,00000000,00000000,00000001 vec:30 type=IO-APIC-edge >> status=00000014 in-flight=0 domain-list=0: 1(-S--), >> (XEN) [2011-12-06 19:13:37] IRQ: 2 >> affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:e2 type=XT-PIC >> status=00000000 mapped, unbound >> (XEN) [2011-12-06 19:13:37] IRQ: 3 >> affinity:00000000,00000000,00000000,00000001 vec:38 type=IO-APIC-edge >> status=00000006 mapped, unbound >> (XEN) [2011-12-06 19:13:37] IRQ: 4 >> affinity:00000000,00000000,00000000,00000001 vec:40 type=IO-APIC-edge >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:37] IRQ: 5 >> affinity:00000000,00000000,00000000,00000001 vec:f1 type=IO-APIC-edge >> status=00000000 mapped, unbound >> (XEN) [2011-12-06 19:13:37] IRQ: 6 >> affinity:00000000,00000000,00000000,00000001 vec:48 type=IO-APIC-edge >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:37] IRQ: 7 >> affinity:00000000,00000000,00000000,00000001 vec:50 type=IO-APIC-edge >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:37] IRQ: 8 >> affinity:00000000,00000000,00000000,00000001 vec:58 type=IO-APIC-edge >> status=00000010 in-flight=0 domain-list=0: 8(-S--), >> (XEN) [2011-12-06 19:13:37] IRQ: 9 >> affinity:00000000,00000000,00000000,00000001 vec:60 type=IO-APIC-level >> status=00000010 in-flight=0 domain-list=0: 9(-S--), >> (XEN) [2011-12-06 19:13:37] IRQ: 10 >> affinity:00000000,00000000,00000000,00000001 vec:68 type=IO-APIC-edge >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:37] IRQ: 11 >> affinity:00000000,00000000,00000000,00000001 vec:70 type=IO-APIC-edge >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:37] IRQ: 12 >> affinity:00000000,00000000,00000000,00000001 vec:78 type=IO-APIC-edge >> status=00000010 in-flight=0 domain-list=0: 12(-S--), >> (XEN) [2011-12-06 19:13:37] IRQ: 13 >> affinity:00000000,00000000,00000000,00000001 vec:88 type=IO-APIC-edge >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:37] IRQ: 14 >> affinity:00000000,00000000,00000000,00000001 vec:90 type=IO-APIC-edge >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:37] IRQ: 15 >> affinity:00000000,00000000,00000000,00000001 vec:98 type=IO-APIC-edge >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:37] IRQ: 16 >> affinity:00000000,00000000,00000000,00000001 vec:c8 type=IO-APIC-level >> status=00000010 in-flight=0 domain-list=0: 16(-S--),1201: 16(--M-), >> (XEN) [2011-12-06 19:13:37] IRQ: 18 >> affinity:00000000,00000000,00000000,00000001 vec:51 type=IO-APIC-level >> status=00000010 in-flight=0 domain-list=0: 18(-S--), >> (XEN) [2011-12-06 19:13:37] IRQ: 19 >> affinity:00000000,00000000,00000000,00000001 vec:d0 type=IO-APIC-level >> status=00000010 in-flight=0 domain-list=0: 19(-S--), >> (XEN) [2011-12-06 19:13:37] IRQ: 21 >> affinity:00000000,00000000,00000000,00000001 vec:61 type=IO-APIC-level >> status=00000010 in-flight=0 domain-list=0: 21(-S--), >> (XEN) [2011-12-06 19:13:37] IRQ: 23 >> affinity:00000000,00000000,00000000,00000001 vec:59 type=IO-APIC-level >> status=00000010 in-flight=1 domain-list=0: 23(PS-M), >> (XEN) [2011-12-06 19:13:37] IRQ: 24 >> affinity:00000000,00000000,00000000,00000001 vec:28 type=DMA_MSI >> status=00000000 mapped, unbound >> (XEN) [2011-12-06 19:13:38] IRQ: 25 >> affinity:00000000,00000000,00000000,00000001 vec:a0 type=PCI-MSI >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:38] IRQ: 26 >> affinity:00000000,00000000,00000000,00000001 vec:a8 type=PCI-MSI >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:38] IRQ: 27 >> affinity:00000000,00000000,00000000,00000001 vec:b0 type=PCI-MSI >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:38] IRQ: 28 >> affinity:00000000,00000000,00000000,00000001 vec:b8 type=PCI-MSI >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:38] IRQ: 29 >> affinity:00000000,00000000,00000000,00000001 vec:c0 type=PCI-MSI >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:38] IRQ: 30 >> affinity:00000000,00000000,00000000,00000001 vec:d8 type=PCI-MSI >> status=00000014 in-flight=0 domain-list=0:274(PS--), >> (XEN) [2011-12-06 19:13:38] IRQ: 31 >> affinity:00000000,00000000,00000000,00000001 vec:21 type=PCI-MSI >> status=00000010 in-flight=0 domain-list=0:273(PS--), >> (XEN) [2011-12-06 19:13:38] IRQ: 32 >> affinity:00000000,00000000,00000000,00000001 vec:29 type=PCI-MSI >> status=00000010 in-flight=0 domain-list=0:272(PS--), >> (XEN) [2011-12-06 19:13:38] IRQ: 33 >> affinity:00000000,00000000,00000000,00000001 vec:31 type=PCI-MSI >> status=00000010 in-flight=0 domain-list=0:271(PS--), >> (XEN) [2011-12-06 19:13:38] IRQ: 34 >> affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:39 type=PCI-MSI >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:38] IRQ: 35 >> affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:41 type=PCI-MSI >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:38] IRQ: 36 >> affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:49 type=PCI-MSI >> status=00000002 mapped, unbound >> (XEN) [2011-12-06 19:13:38] IRQ: 37 >> affinity:00000000,00000000,00000000,00000010 vec:65 type=PCI-MSI >> status=00000010 in-flight=0 domain-list=1201: 55(--M-), >> (XEN) [2011-12-06 19:13:38] IO-APIC interrupt information: >> (XEN) [2011-12-06 19:13:38] IRQ 0 Vec240: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 2: vector=240, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=edge, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 1 Vec 48: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 1: vector=48, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=edge, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 3 Vec 56: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 3: vector=56, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=edge, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 4 Vec 64: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 4: vector=64, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=edge, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 5 Vec241: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 5: vector=241, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=edge, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 6 Vec 72: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 6: vector=72, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=edge, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 7 Vec 80: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 7: vector=80, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=edge, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 8 Vec 88: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 8: vector=88, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=edge, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 9 Vec 96: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 9: vector=96, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=level, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 10 Vec104: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 10: vector=104, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=edge, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 11 Vec112: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 11: vector=112, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=edge, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 12 Vec120: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 12: vector=120, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=edge, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 13 Vec136: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 13: vector=136, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=edge, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 14 Vec144: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 14: vector=144, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=edge, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 15 Vec152: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 15: vector=152, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=0, >> irr=0, trigger=edge, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 16 Vec200: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 16: vector=200, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=1, >> irr=0, trigger=level, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 18 Vec 81: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 18: vector=81, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=1, >> irr=0, trigger=level, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 19 Vec208: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 19: vector=208, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=1, >> irr=0, trigger=level, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 21 Vec 97: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 21: vector=97, >> delivery_mode=1, dest_mode=logical, delivery_status=0, polarity=1, >> irr=0, trigger=level, mask=0, dest_id:1 >> (XEN) [2011-12-06 19:13:38] IRQ 23 Vec 89: >> (XEN) [2011-12-06 19:13:38] Apic 0x00, Pin 23: vector=89, >> delivery_mode=1, dest_mode=logical, delivery_status=1, polarity=1, >> irr=1, trigger=level, mask=0, dest_id:1 >> >> I noticed some potential interesting things. The system in question is >> using PCI passthrough. On the serial console I remember seeing that >> the PCI device got unmapped from DomU and got mapped again in Dom0. >> The serial console log still had a lot of information about this DomU >> which I guess was going down. I guess it wasn't fully down yet. > > One of the debug informations that gets printed with '*' is the guest > PCI passthrough data. Such as which IRQ, BAR, etc are mapped in. Did any > of those exist? There is not much PCI related information. Just interrupt stuff, no PCI BARs or other interesting PCI stuff: (XEN) [2011-12-06 19:13:20] 03:00.0 - dom 1201 - MSIs < 37 > (XEN) [2011-12-06 19:13:20] MSI 37 vec=65 lowest edge assert log lowest dest=00000010 mask=0/0/-1 > My original thought of what is going on is that the interrupts either > stopped completly working (does not look like - it looks like the > interrupts are firring and the event channels that are bound to it have > the bits set saying - hey I am pending). Oddly there are bunch of MSI > ones that are masked which usually means they are disabled. > > Then there is the > affinity:00000000,00000000,00000000,00000001 vec:c8 type=IO-APIC-level > status=00000010 in-flight=0 domain-list=0: 16(-S--),1201: 16(--M-), > affinity:00000000,00000000,00000000,00000010 vec:65 type=PCI-MSI > status=00000010 in-flight=0 domain-list=1201: 55(--M-), > > The guest has masked both interrupts - so it is off//dying, but somehow > the unmapping has not happend. In other words, what you just analyzed > and found out. > > Sadly, there is no smoking gun.. > > So a couple of things that I would do is to ensure that the Xen > hypervisor boots with 'sync_console console_to_ring' and when this > crash happens see if I can get a stack trace from both the Xen > hypervisor (there are some debug parameters to get that - I think it is > 'r'?), and also from the Linux kernel. I set up some new tests, so that will take some days to get a crash. For what it is worth, I do have a '*' log left from this system and it has some stack trace. If you think it is useful, I could send it gzipped, but I don't want to spam this list if it may not be useful. One thing I did notice was the following: (XEN) [2011-12-06 19:13:39] active vcpus: (XEN) [2011-12-06 19:13:39] 1: [1201.2] pri=-2 flags=0 cpu=6 credit=-2411 [w=256] (XEN) [2011-12-06 19:13:39] 2: [1201.1] pri=-2 flags=0 cpu=5 credit=-2460 [w=256] (XEN) [2011-12-06 19:13:39] 3: [0.2] pri=0 flags=0 cpu=2 credit=142 [w=256] (XEN) [2011-12-06 19:13:39] 4: [0.1] pri=-2 flags=0 cpu=1 credit=-2612 [w=256] (XEN) [2011-12-06 19:13:39] CPU[00] sort=12511062, sibling=00000000,00000000,00000000,00000003, core=00000000,00000000,00000000,000000ff (XEN) [2011-12-06 19:13:39] run: [32767.0] pri=0 flags=0 cpu=0 (XEN) [2011-12-06 19:13:39] 1: [0.0] pri=-1 flags=0 cpu=0 credit=333 [w=256] (XEN) [2011-12-06 19:13:39] CPU[01] sort=12511063, sibling=00000000,00000000,00000000,00000003, core=00000000,00000000,00000000,000000ff (XEN) [2011-12-06 19:13:39] run: [0.1] pri=-2 flags=0 cpu=1 credit=-2914 [w=256] (XEN) [2011-12-06 19:13:39] 1: [32767.1] pri=-64 flags=0 cpu=1 (XEN) [2011-12-06 19:13:39] CPU[02] sort=12511063, sibling=00000000,00000000,00000000,0000000c, core=00000000,00000000,00000000,000000ff (XEN) [2011-12-06 19:13:39] run: [32767.2] pri=-64 flags=0 cpu=2 (XEN) [2011-12-06 19:13:39] CPU[03] sort=12511063, sibling=00000000,00000000,00000000,0000000c, core=00000000,00000000,00000000,000000ff (XEN) [2011-12-06 19:13:39] run: [32767.3] pri=-64 flags=0 cpu=3 (XEN) [2011-12-06 19:13:39] CPU[04] sort=12511063, sibling=00000000,00000000,00000000,00000030, core=00000000,00000000,00000000,000000ff (XEN) [2011-12-06 19:13:39] run: [32767.4] pri=-64 flags=0 cpu=4 (XEN) [2011-12-06 19:13:39] CPU[05] sort=12511063, sibling=00000000,00000000,00000000,00000030, core=00000000,00000000,00000000,000000ff (XEN) [2011-12-06 19:13:39] run: [1201.1] pri=-2 flags=0 cpu=5 credit=-3617 [w=256] (XEN) [2011-12-06 19:13:39] 1: [32767.5] pri=-64 flags=0 cpu=5 (XEN) [2011-12-06 19:13:39] CPU[06] sort=12511063, sibling=00000000,00000000,00000000,000000c0, core=00000000,00000000,00000000,000000ff (XEN) [2011-12-06 19:13:39] run: [1201.2] pri=-2 flags=0 cpu=6 credit=-3918 [w=256] (XEN) [2011-12-06 19:13:39] 1: [32767.6] pri=-64 flags=0 cpu=6 (XEN) [2011-12-06 19:13:39] CPU[07] sort=12511063, sibling=00000000,00000000,00000000,000000c0, core=00000000,00000000,00000000,000000ff (XEN) [2011-12-06 19:13:39] run: [32767.7] pri=-64 flags=0 cpu=7 The system in question has a 4-core i7 CPU where Dom0 is pinned to core 0-3 and DomU is pinned 4-7. Aren't the negative credit numbers quite big? > > But also see if the problem disappears with the latest 4.1.x hypervisor. > And it might also be worth upgrading the dom0 to a 3.0. Hmm, it would be > very interesting to see where the dom0 _is_ stuck at. The hypervisor is > running fine so it all points to dom0 crashing somewhere.. > > Can you make sure that dom0 runs with 'debug loglevel=8' as well. That > should give some wealth of information when/if a crash happens. > Oh, and try to pass in Ctrl-Alt-SysRQ-t (on serial concentrators I think > you just need to type 'break' on the telnet line)and then t. But I am > not entirely sure about it - might want to double check Google and see > how to enable Alt-SysRQ. I tried to get SysRq working. It worked fine on some of my other machines (sysrq= 'shift-control-o'), but not on this broken box. Apparently the kernel is stuck in some place where SysRq doesn't work. I'm going to retest with the relevant logging turned on. I also upgraded to the latest 2.6.32 kernel (.48 I think). I will also upgrade to the latest 4.1.x build from the mercurial repository. It will take a few days before I get crash. Thanks so far! Roderick _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |