[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] irqbalance seg faults with 2.6.38 or later kernels [patch + solution included] if running under Xen hypervisor



>>> On 11.05.11 at 15:10, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
> On Wed, May 11, 2011 at 09:16:53AM +0100, Jan Beulich wrote:
>> >>> On 11.05.11 at 02:33, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> 
>> >>> wrote:
>> > The reason behind it is that irqbalance parses the /proc/interrupts
>> > and whenever it hits something it can't understand:
>> > 
>> >  RES:  191614137   73904910    Rescheduling interrupts
>> > 
>> > It will count the number of interrupts towards the IRQ 0. That IRQ does 
>> > exist
>> > when the kernel boots under baremetal:
>> > 
>> >   0:         46          0       IO-APIC-edge      timer
>> > 
>> > but under Xen, the timer interrupts are initialized much later:
>> > 
>> >  272:   41197188          0        xen-percpu-virq      timer0
>> > 
>> > and the first IRQ that is used is not zero, but rather one:
>> > 
>> >    1:      73037          0          0          0          0          0  
>> > xen-pirq-ioapic-edge  i8042
>> > 
>> > so when irqbalance tries to account for the IRQ 'RES' to the IRQ 0
>> > it fails and segfaults. The attached patch fixes it for whoever else is
>> > hitting this problem.
>> 
>> In the svn snapshot I have, I see
>> 
>>              /* lines with letters in front are special, like NMI count. 
>> Ignore */
>>              if (!(line[0]==' ' || (line[0]>='0' && line[0]<='9')))
>>                      break;
>> 
>> which I would think should be taking care of your problem (or
>> I mis-read your description), and which was there already before
> 
> Not anymore. In kernels 2.6.37:
> 
>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       
> 
> .. snip.
> NMI:          0          0          0          0   Non-maskable interrupts
> LOC:   12413629   12858323   16296183   11098466   Local timer interrupts
> 
> In 2.6.38 and later:
>             CPU0       CPU1       CPU2       CPU3       CPU4       CPU5      
>  
>  TRM:          0          0          0          0          0          0   
> Thermal event interrupts
>  THR:          0          0          0          0          0          0   
> Threshold APIC interrupts
>  MCE:          0          0          0          0          0          0   
> Machine check exceptions
> 
> They added in a space before the name. The check you mentioned
> above could be augmented for this of course, as another solution
> for this.

Not generally - this depends on your configuration. I just check on
ma laptop, and there is no extra space there. It's presumably
indeed what I wrote here:

>> 0.56. Or are you perhaps having the problem because you have
>> 1000+ interrupts, thus causing even the non-numeric strings to
>> get space padded on their left? In that case I'd rather think above
>> check should be either improved or removed (replaced by your
>> solution).

... and this left padding had been introduced a lot earlier than
.37 iirc.

Jan

>> > I am not sure who the upstream maintainer is for this so
>> > I am sending this patch to the different distros as well.
>> 
>> Copying Neil and Arjan.
>> 
>> Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.