[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86: show remote CPU state upon fatal NMI



>>> On 14.06.16 at 17:03, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 14/06/16 15:33, Jan Beulich wrote:
>> @@ -525,6 +526,35 @@ void vcpu_show_execution_state(struct vc
>>      vcpu_unpause(v);
>>  }
>>  
>> +static cpumask_t nmi_show_state_mask;
>> +static bool_t opt_nmi_show_all;
>> +
>> +static int __init get_nmi_show_all(void)
>> +{
>> +    const char *s = strchr(opt_nmi, ',');
>> +
>> +    if ( s && !strcmp(s + 1, "show-all") )
>> +        opt_nmi_show_all = 1;
>> +
>> +    return 0;
>> +}
>> +presmp_initcall(get_nmi_show_all);
>> +
>> +static int nmi_show_execution_state(const struct cpu_user_regs *regs, int 
>> cpu)
>> +{
>> +    if ( !cpumask_test_cpu(cpu, &nmi_show_state_mask) )
>> +        return 0;
>> +
>> +    if ( opt_nmi_show_all )
>> +        show_execution_state(regs);
>> +    else
>> +        printk(XENLOG_ERR "CPU%d @ %04x:%08lx (%pS)\n", cpu, regs->cs, 
>> regs->rip,
>> +               guest_mode(regs) ? _p(regs->rip) : NULL);
>> +    cpumask_clear_cpu(cpu, &nmi_show_state_mask);
> 
> I would clear the mask before printing state.  Given the nature of this
> handler, it liable to contend sufficiently on the console lock to induce
> the further watchdog timeout.

I had it that way for a brief period of time, but it's wrong: It would
let the master continue its bringing down of the host before all CPUs
managed to print their stuff. And I don't see any issue with spin lock
contention here at all - performance is certainly not of any concern.

>> @@ -570,6 +600,15 @@ void fatal_trap(const struct cpu_user_re
>>              printk("Faulting linear address: %p\n", _p(cr2));
>>              show_page_walk(cr2);
>>          }
>> +        else if ( trapnr == TRAP_nmi )
>> +        {
>> +            cpumask_andnot(&nmi_show_state_mask, &cpu_online_map,
>> +                           cpumask_of(smp_processor_id()));
>> +            set_nmi_callback(nmi_show_execution_state);
>> +            smp_send_nmi_allbutself();
> 
> This would cause far less spinlock contention as
> 
> for_each_cpu( cpu, nmi_show_state_mask )
>     smp_send_nmi(cpu);
> 
> I realise this involves introducing a new smp function, but it would
> substantially reduce contention on the console lock.

Again, I don't see why lock contention would matter here. And then
I also don't see how sending the IPIs individually would make matters
significantly better: The sending will surely finish much faster than
the printing.

> I would recommend moving this clause into nmi_watchdog_tick(), so that
> it doesn't get involved for non-watchdog NMIs.  IOCK/SERR NMIs won't
> have anything interesting to print from here.  I would also recommend
> disabling the watchdog before IPI'ing.

And indeed I would have wanted it there, but I can't see how it can
reasonably be put there: fatal_trap() doesn't return, so we can't put
it after. And we definitely want to get state of the local CPU out
before we try to log state of any of the remote CPUs. So the only
options I see would be to
- somehow specially flag the regs structure, but that feels hackish
  (among other aspects nmi_watchdog_tick() has that parameter
  const qualified for the very reason that it isn't supposed to fiddle
  with it),
- introduce a global flag.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.