[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Xen/timer: Disable watchdog during dumping timer queues

>>> On 20.09.16 at 16:52, <tianyu.lan@xxxxxxxxx> wrote:
> On 9/19/2016 10:46 PM, Jan Beulich wrote:
>>>> Well, without a clear understanding of why the issue occurs (for
>>>> >> which I need to refer you back to the questionable stack dump)
>>>> >> I'm hesitant to agree to this step, yet ...
>>> >
>>> > After some researches, I found do_invalid_op() on the stack dump is
>>> > caused by run_in_exception_handler(__ns16550_poll) in the ns16550_poll()
>>> > rather than fatal event. The timeout issue still exists when run
>>> > __ns16550_poll() directly in the ns16550_poll().
>> Well, I then still don't see why e.g. dump_domains() doesn't also need
>> it.
> After testing, dump_domains() also has such issue after I create two VM
> with 128 vcpus.
>> Earlier you did say:
>>   Keyhandler may run in the timer handler and the following log shows
>>   calltrace. The timer subsystem run all expired timers' handler
>>   before programing next timer event. If keyhandler runs longer than
>>   timeout, there will be no chance to configure timer before triggering
>>   watchdog and hypervisor rebooting.
>> The fact that using debug keys may adversely affect the rest of the
>> system is known. And the nesting of process_pending_softirqs()
>> inside do_softirq() should, from looking at them, work fine. So I
>> continue to have trouble seeing the specific reason for the problem
>> you say you observe.
> The precondition of process_pending_softirq() working in the debug key
> handler is that timer interrupt arrives on time and nmi_timer_fn() can
> run to update nmi_timer_ticks before watchdog timeout.


> When a timer interrupt arrives, timer_softirq_action() will run all
> expired timer handlers before programing next timer interrupt via
> reprogram_timer(). If a timer handler runs too long E,G >5s(Time for
> watchdog timeout is default to be 5s.), this will cause no timer
> interrupt arriving within 5s and nmi_timer_fn() also won't be called.
> Does this make sense to you?

Partly. I continue to think that the sequence

some keyhandler
        timer interrupt
keyhandler continues
keyhandler calls process_pending_softirq()

should, among other things, result in timer_softirq_action() to get
run. And I don't see the _timer_ handler running for to long here,
only a key handler. Are you perhaps instead suffering from the
nested instance of timer_softirq_action() not being able to acquire
its lock? That would be an entirely different issue than you had
described so far.

And irrespective of this it is of course quite clear that timers aren't
meant to run heavyweight work like key handlers, so the way
ns16550_poll() works right now is probably what we'll want to alter.
Which btw raises another question: Why are you in polling mode in
the first place? Do you have a UART without working interrupt?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.