[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug
>From: Keir Fraser [mailto:Keir.Fraser@xxxxxxxxxxxx] >Sent: 2007年1月30日 21:13 >On 30/1/07 1:09 pm, "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > >>> I'm sure this will fix the issue. But who knows what real underlying >issue >>> it might be hiding? >>> >>> -- Keir >> >> I'm not sure whether it hides something. But the current situation >> seems like a self-trap to me: watchdog waits for timer interrupt to be >> awaken in 1s interval, while timer interrupt deliberately schedules a >> longer interval without considering watchdog and then blames >> watchdog thread not running within 10s. :-) > >Actually I think you're right -- if this fixes the issue then it points to a >problem in the next_timer_event code. So it would actually be interesting >to >try clamping the timeout to one second. > > -- Keir By a simple change like this: @@ -962,7 +962,8 @@ u64 jiffies_to_st(unsigned long j) } else if (((unsigned long)delta >> (BITS_PER_LONG-3)) != 0) { /* Very long timeout means there is no pending timer. * We indicate this to Xen by passing zero timeout. */ - st = 0; + //st = 0; + st = processed_system_time + HZ * (u64)NS_PER_TICK; } else { st = processed_system_time + delta * (u64)NS_PER_TICK; } I really expected to say it as the root fix, however I can't though this change made it better. I created a domU with 4 VCPUs on 2 CPUs box, and tried to hot-remove/plug vcpu 1,2,3 alternatively. After about ten rounds test, everything is just OK. However several minutes later, I saw that warning again, though far less frequent than before. So I have to dig more into this bug. The first thing I plan to do, is to make sure whether such long timeout is requested as what guest wants, or it's xen to enlarge that timeout underlyingly... :-( BTW, do you think whether it's worthy to destroy vcpu from scheduler when it's down and then re-init that vcpu into scheduler when it's on? I don't know whether this will make any influence to accounting of scheduler. Actually domain save/restore doesn't show this bug, and one obvious distinct compared to vcpu-hotplug is that domain is restored in a new context... Thanks, Kevin P.S. some trace log attached. You can see that drift in each warning is just around 1000 ticks. [root@localhost ~]# BUG: soft lockup detected on CPU#1! BUG: drift by 0x41e [<c0151301>] softlockup_tick+0xd1/0x100 [<c01095d4>] timer_interrupt+0x4e4/0x640 [<c011bbae>] try_to_wake_up+0x24e/0x300 [<c0151c89>] handle_IRQ_event+0x59/0xa0 [<c0151d65>] __do_IRQ+0x95/0x120 [<c010708f>] do_IRQ+0x3f/0xa0 [<c0103070>] xen_idle+0x0/0x60 [<c024e355>] evtchn_do_upcall+0xb5/0x120 [<c0103070>] xen_idle+0x0/0x60 [<c01057a5>] hypervisor_callback+0x3d/0x48 [<c0103070>] xen_idle+0x0/0x60 [<c0109d40>] raw_safe_halt+0x20/0x50 [<c01030a1>] xen_idle+0x31/0x60 [<c010316e>] cpu_idle+0x9e/0xf0 BUG: soft lockup detected on CPU#2! BUG: drift by 0x447 [<c0151301>] softlockup_tick+0xd1/0x100 [<c01095d4>] timer_interrupt+0x4e4/0x640 [<c011bbae>] try_to_wake_up+0x24e/0x300 [<c0151c89>] handle_IRQ_event+0x59/0xa0 [<c0151d65>] __do_IRQ+0x95/0x120 [<c010708f>] do_IRQ+0x3f/0xa0 [<c0103070>] xen_idle+0x0/0x60 [<c024e355>] evtchn_do_upcall+0xb5/0x120 [<c0103070>] xen_idle+0x0/0x60 [<c01057a5>] hypervisor_callback+0x3d/0x48 [<c0103070>] xen_idle+0x0/0x60 [<c0109d40>] raw_safe_halt+0x20/0x50 [<c01030a1>] xen_idle+0x31/0x60 [<c010316e>] cpu_idle+0x9e/0xf0 BUG: soft lockup detected on CPU#1! BUG: drift by 0x43f [<c0151301>] softlockup_tick+0xd1/0x100 [<c01095d4>] timer_interrupt+0x4e4/0x640 [<c011bbae>] try_to_wake_up+0x24e/0x300 [<c0151c89>] handle_IRQ_event+0x59/0xa0 [<c0151d65>] __do_IRQ+0x95/0x120 [<c010708f>] do_IRQ+0x3f/0xa0 [<c0103070>] xen_idle+0x0/0x60 [<c024e355>] evtchn_do_upcall+0xb5/0x120 [<c0103070>] xen_idle+0x0/0x60 [<c01057a5>] hypervisor_callback+0x3d/0x48 [<c0103070>] xen_idle+0x0/0x60 [<c0109d40>] raw_safe_halt+0x20/0x50 [<c01030a1>] xen_idle+0x31/0x60 [<c010316e>] cpu_idle+0x9e/0xf0 BUG: soft lockup detected on CPU#1! BUG: drift by 0x3ea [<c0151301>] softlockup_tick+0xd1/0x100 [<c01095d4>] timer_interrupt+0x4e4/0x640 [<c0137699>] __rcu_process_callbacks+0x99/0x100 [<c0129867>] tasklet_action+0x87/0x130 [<c0151c89>] handle_IRQ_event+0x59/0xa0 [<c0151d65>] __do_IRQ+0x95/0x120 [<c010708f>] do_IRQ+0x3f/0xa0 [<c0103070>] xen_idle+0x0/0x60 [<c024e355>] evtchn_do_upcall+0xb5/0x120 [<c0103070>] xen_idle+0x0/0x60 [<c01057a5>] hypervisor_callback+0x3d/0x48 [<c0103070>] xen_idle+0x0/0x60 [<c0109d40>] raw_safe_halt+0x20/0x50 [<c01030a1>] xen_idle+0x31/0x60 [<c010316e>] cpu_idle+0x9e/0xf0 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |