[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)



Xiantao,

I'm sorry I forgot to mention that I did apply your two patches but it didn't 
have any effect (interrupts still lost after changing smp_affinity and "No 
handler for irq vector" message). I added a dprintk in msi_set_mask_bit() and 
realized that MSI does not have a mask bit (MSIX does). My PCI device uses MSI 
not MSIX. I placed my dprintk inside the condition below and it never triggered.

    switch (entry->msi_attrib.type) {
    case PCI_CAP_ID_MSI:
        if (entry->msi_attrib.maskbit) {

While debugging this problem, I thought about the potential problem of an 
interrupt firing between the writes for the MSI message address and MSI message 
data. I noticed that pci_conf_write() uses spin_lock_irqsave() to disable 
interrupts before issuing the "out" instruction but the writes for the address 
and data are two separate pci_conf_write() calls. To me, it would be safer to 
write the address and data in a single call and preceded by 
spin_lock_irqsave(). This way, when the interrupts are enabled, the address and 
data have both been updated.

Dante

-----Original Message-----
From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx] 
Sent: Thursday, October 22, 2009 2:42 AM
To: Zhang, Xiantao; Jan Beulich
Cc: He, Qing; xen-devel@xxxxxxxxxxxxxxxxxxx; Cinco, Dante
Subject: Re: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP 
ProLiant G6 with dual Xeon 5540 (Nehalem)

On 22/10/2009 09:41, "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx> wrote:

>> Hmm, then I don't understand which case your patch was a fix for: I 
>> understood that it addresses an issue when the affinity of an 
>> interrupt gets changed (requiring a re-write of the address/data 
>> pair). If the hypervisor can deal with it without masking, then why 
>> did you add it?
> 
> Hmm, sorry, seems I misunderstood your question. If the msi doesn't 
> support mask bit(clearing MSI enable bit doesn't help in this case), 
> the issue may still exist. Just checked Linux side, seems it doesn't 
> perform mask operation when program MSI, but don't know why Linux 
> hasn't such issues.  Actaully, we do see inconsisten interrupt message 
> from the device without this patch, and after applying the patch, the 
> issue is gone.  May need further investigation why Linux doesn't need the 
> mask operation.

Linux is quite careful about when it will reprogram vector/affinity info isn't 
it? Doesn't it mark such an update pending and only flush it through during 
next interrupt delivery, or something like that? Do we need some of the 
upstream Linux patches for this?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.