[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] POD: soft lockups in dom0 kernel



On 06/12/13 10:00, Jan Beulich wrote:
>>>> On 05.12.13 at 14:55, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx> wrote:
>> when creating a bigger (> 50 GB) HVM guest with maxmem > memory we get
>> softlockups from time to time.
>>
>> kernel: [  802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351]
>>
>> I tracked this down to the call of xc_domain_set_pod_target() and further
>> p2m_pod_set_mem_target().
>>
>> Unfortunately I can this check only with xen-4.2.2 as I don't have a machine
>> with enough memory for current hypervisors. But it seems the code is nearly
>> the same.
>>
>> My suggestion would be to do the 'pod set target' in the function
>> xc_domain_set_pod_target() in chunks of maybe 1GB to give the dom0 scheduler
>> a chance to run.
>> As this is not performance critical it should not be a problem.
> 
> This is a broader problem: There are more long running hypercalls
> than just the one setting the POD target. While a kernel built with
> CONFIG_PREEMPT ought to have no issue with this (as the
> hypervisor internal preemption will always exit back to the guest,
> thus allowing interrupts to be processed) as long as such
> hypercalls aren't invoked with preemption disabled, non-
> preemptable kernels (the suggested default for servers) have -
> afaict - no way to deal with this.
> 
> However, as long as interrupts and softirqs can get serviced by
> the kernel (which they can as long as they weren't disabled upon
> invocation of the hypercall), that may also be a mostly cosmetic
> problem (in that the soft lockup is being reported) as long as no
> real time like guarantees are required (which if they were would
> be sort of contradictory to the kernel being non-preemptable),
> i.e. other tasks may get starved for some time, but OS health
> shouldn't be impacted.
> 
> Hence I wonder whether it wouldn't make sense to simply
> suppress the soft lockup detection at least across privcmd
> invoked hypercalls - Cc-ing upstream Linux maintainers to see if
> they have an opinion or thoughts towards a proper solution.

We do not want to disable the soft lockup detection here as it has found
a bug.  We can't have tasks that are unschedulable for minutes as it
would only take a handful of such tasks to hose the system.

We should put an explicit preemption point in.  This will fix it for the
CONFIG_PREEMPT_VOLUNTARY case which I think is the most common
configuration.  Or perhaps this should even be a cond_reched() call to
fix it for fully non-preemptible as well.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.