[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] [PATCH] Fix softlockup issue after vcpu hotplug



>From: Graham, Simon [mailto:Simon.Graham@xxxxxxxxxxx]
>Sent: 2007年1月31日 3:29
>> On 30/1/07 09:54, "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
>>
>> > Another simple approach to trigger such warning is to let
>> > __xen_suspend() jumps to smp_resume immediately after
>> > smp_suspend, as a test case for suspend cancel. People can
>> > observe all vcpus except vcpu0 fall into that warning frequently.
>>
>> Do you know if this problem has been observed across many versions
>of
>> Xen or
>> e.g., only after the upgrade to 2.6.18?
>>
>
>I'm not sure but I think that we've been seeing something very similar
>when live migrating domains with 3.0.3/2.6.16.29) -- my understanding is
>that the live migration code takes the domain down to UP, does the
>migration and then restores SMP -- we VERY often see soft lockup
>messages following this (several times per night in our regression
>testing) with stack traces identical to those posted by Kevin.
>
>I also added some instrumentation and in every single case, the 'stolen'
>time is > 5s when we see the soft lockup.
>
>Simon

Hi, Simon,
        You case should be different as what I saw, which may be fixed 
by the original patch I posted which however doesn't apply to latest. 
In 2.6.16 version, it's do_timer to call softlock_tick instead of 
run_local_timers. So the check on "stolen > 5s" is a bit late to still 
allow warning jumped out though adjusted later. Could you try 
attached patch to see whether fixing for your live migration case?

Thanks,
Kevin

Attachment: fix_softlockup_2616.patch
Description: fix_softlockup_2616.patch

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.