[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v8] interrupts: allow guest to set/clear MSI-X mask bit

On Wed, Sep 18, 2013 at 03:19:51AM +0000, Xu, YongweiX wrote:
> > -----Original Message-----
> > From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > Sent: Tuesday, September 17, 2013 2:39 PM
> > To: Xu, YongweiX
> > Cc: Joby Poriyath; Sander Eikelenboom; Zhou, Chao; Liu, SongtaoX; xen-devel
> > Subject: RE: [Xen-devel] [PATCH v8] interrupts: allow guest to set/clear 
> > MSI-X
> > mask bit
> > 
> > >>> On 17.09.13 at 05:06, "Xu, YongweiX" <yongweix.xu@xxxxxxxxx> wrote:
> > > I've provided 'i' an 'M' debug key to output the IRQ and MSI-X
> > > information, the 'xl dmesg'log of dom0 as the attachment:xl_dmesg.log.
> > 
> > Nothing odd there, but then again you also still didn't tell us which IRQ 
> > it is that
> > gets switched off by the guest kernel. Even without me saying so 
> > explicitly, it
> > should be pretty clear that sending complete information (matching up
> > hypervisor, host, and guest
> > logs) ideally limited to just one guest instance (the log here has 6
> > guests) would provide the maximum information. Remember that unless you're
> > going to debug the problem yourself, we depend on the information coming
> > from you being complete and consistent.
> > In the case at hand, telling us whether the log was taken with a VF or PF
> > assigned would also be relevant information (which would presumably be
> > deducible from the guest kernel log if you had sent it).
> I've retested this issue for many times, but can only get these log as 
> attachment, it can only be found IRQ #50 issue on guest but cannot found on 
> Dom0 "xl dmesg" log, if you think it's still lack of persuasion, do you have 
> any method or patch to capture more IRQ information? thanksï

I did some testing with Xen 4.3, RHEL 6.4, qemu-traditional.
I used Intel 82599 VF for pass through.
I noticed that guest is losing network soon after it boots
(may be a minute or so). I could recover the network by 
     1. stopping irqbalance
     2. reconfiguring network
And then it stayed up.

Here is what's happening....

The irqbalance, triggers the irq migration. Guest kernel will mask the
MSI-X interrupt. Xen will allow this and MSI-X interrupt is masked.

Then guest kernel will update the vector. These writes are trapped by Xen.
Xen will make a note that MSI-X vector has been updated, and then it'll
exit to Qemu. 

Qemu makes a note of the updated vector, but it won't inform Xen yet. 
This is because, the exit to Qemu happens for every 32-bit writes and
MSI-X vector is 128-bit. So it'll call Xen only when guest writes the MSI-X 
control word.

Guest kernel, after having updated the MSI-X vector will unmask the MSI-X 
interrupt. This will trap into Xen. 

Xen notices that the vector has been updated, so it'll exit to Qemu 
without unmasking the MSI-X vector. 

Qemu will check that MSI-X is indeed masked. If this is not the case
guest attempt to update MSI-X vector is ignored.

If the MSI-X vector is masked, Qemu will call Xen to update the MSI-X vector 
(xc_domain_update_msi_irq). But xc_domain_update_msi_irq doesn't unmask the
MSI-X, and it remains masked. And guest loses network.

With slightly older Qemu (before git commit 
Qemu had write access to MSI-X table, so it would go ahead and unmask the MSI-X
vector.  I've tested this on Xen 4.1 (XenServer 6.1).

Without the patch, guest kernel's attempt to migrate the IRQ remains 
Xen will silently ignore guest's attempt to mask MSI-X vector. So it 
remains unmasked. Although Xen will exit to Qemu when guest kernel
tries to update the MSI-X vector, Qemu doesn't call Xen since it notices that
MSI-X is in unmasked state. 

And finally, without the patch, SR-IOV pass through is broken (which the patch
attempted to fix).

I can't explain the behaviour that you are seeing.

Could you please test with IRQ balance turned off?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.