[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug


  • To: "Keir Fraser" <Keir.Fraser@xxxxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "Tian, Kevin" <kevin.tian@xxxxxxxxx>
  • Date: Tue, 30 Jan 2007 20:11:44 +0800
  • Delivery-date: Tue, 30 Jan 2007 04:11:28 -0800
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AcdESFqDCWsISfq5RGeHgxcxVzRqmQACelaDAAAZiDAAAP4q2wADxf1Q
  • Thread-topic: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug

>From: Keir Fraser [mailto:Keir.Fraser@xxxxxxxxxxxx]
>Sent: 2007年1月30日 18:09
>
>On 30/1/07 09:54, "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
>
>> If we don't take into account blocked time, maybe we have to disable
>> softlockup check. Say an idle process gets a timeout value larger than
>> 10s by next_timer_interrupt, and then blocked. If, unfortunately, there's
>> no other events happening before that timeout value, this vcpu will see
>> softlockup warning after that timeout immediately since this period is
>> not categorized into stolen time.
>
>Presumably softlockup threads are killed and re-created when VCPUs
>are
>offlined and onlined. Perhaps the re-creation is taking a long time? But

That should not be the case, since the softlockup warning continues 
to jump out after cpu is brought online.

>10s
>would be a *very* long time. And once it is created and bound to the
>correct
>VCPU we should never see long timeouts when blocking (since
>softlockup
>thread timeout is never longer than a few seconds).

Yeah, I noted this point just after sending out the mail.

>
>Perhaps there is a bug in our cpu onlining code -- a big timeout like that
>does need investigating. I don't think we can claim this bug is
>root-caused
>yet so it's premature to be applying patches.
>

Agree. I'll do more investigation on this point. Just quickly compared 
the watchdog thread between 2.6.18 and 2.6.16. Previously in 2.6.16, 
an explicit schedule timeout with 1s is used, while 2.6.18 wakes up 
the watchdog thread per second from timer interrupt (softlockup_tick). 
One distinct difference on this change is, watchdog thread in 2.6.16 
will have a soft timer registered while 2.6.18 not. I'm doubting that 
this may make some difference to decision of next_timer_interrupt.

By the way, do you think whether scheduler may do something to 
punish new-online vcpu? Just from code, I didn't see that since new 
awaken vcpu is always boosted... However in the actual, I found 
that virtual timer interrupt number increased slowly for that cpu by 
'cat /proc/interrupts'. Sometimes it may even freeze for dozen of 
seconds. But yes, this may the phenomenon instead of reason. :-)

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.