[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] MSI message data register configuration in Xen guests



On Thu, 28 Jun 2012, Deep Debroy wrote:
> On Wed, Jun 27, 2012 at 4:18 PM, Deep Debroy <ddebroy@xxxxxxxxx> wrote:
> > On Mon, Jun 25, 2012 at 7:51 PM, Rolu <rolu@xxxxxxxx> wrote:
> >>
> >> On Tue, Jun 26, 2012 at 4:38 AM, Deep Debroy <ddebroy@xxxxxxxxx> wrote:
> >> > Hi, I was playing around with a MSI capable virtual device (so far
> >> > submitted as patches only) in the upstream qemu tree but having
> >> > trouble getting it to work on a Xen hvm guest. The device happens to
> >> > be a QEMU implementation of VMWare's pvscsi controller.ÂThe device
> >> > works fine in a Xen guest when I switch the device'sÂcode to force
> >> > usage of legacy interrupts with upstream QEMU. With MSI based
> >> > interrupts, the device works fine on a KVM guest but as stated before,
> >> > not on a Xen guest. After digging a bit, it appears, the reason for
> >> > the failure in Xen guests is that the MSI data register in the Xen
> >> > guest ends up with a value of 4300 where the Deliver Mode value of 3
> >> > happens to be reserved (per spec) and therefore illegal. The
> >> > vmsi_deliver routine in Xen rejects MSI interrupts with such data as
> >> > illegal (per expectation) causing all commands issued by the guest OS
> >> > on the device to timeout.
> >> >
> >> > Given this above scenario, I was wondering if anyone can shed some
> >> > light on how to debug this further for Xen. Something I would
> >> > specifically like to know is where the MSI data register configuration
> >> > actually happens. Is it done by some code specific to Xen and within
> >> > the Xen codebase or it all done within QEMU?
> >> >
> >>
> >> This seems like the same issue I ran into, though in my case it is
> >> with passed through physical devices. See
> >> http://lists.xen.org/archives/html/xen-devel/2012-06/msg01423.html and
> >> the older messages in that thread for more info on what's going on. No
> >> fix yet but help debugging is very welcome.
> >
> > Thanks Rolu for pointing out the other thread - it was very useful.
> > Some of the symptoms appear to be identical in my case. However, I am
> > not using a pass-through device. Instead, in my case it's a fully
> > virtualized device pretty much identical to a raw file backed disk
> > image where the controller is pvscsi rather than lsi. Therefore I
> > guess some of the latter discussion in the other thread around
> > pass-through specific areas of code in qemu are not relevant? Please
> > correct me if I am wrong. Also note that I am using upstream qemu
> > where neither the #define for PT_PCI_MSITRANSLATE_DEFAULT nor
> > xenstore.c exsits (which is where Stefano's suggested change appeared
> > to be).
> >
> > So far, here's what I am observing in the hvm linux guest :
> >
> > On the guest side, as discussed in the other thread,
> > xen_hvm_setup_msi_irqs is invoked for the device and a value of 0x4300
> > is being by xen_msi_compose_msg that is written in the data register.
> > On the qemu (upstream) side, when the virtualized controller is trying
> > to complete a request, it's invoking the following chain of calls ->
> > stl_le_phys -> xen_apic_mem_write -> xen_hvm_inject_msi
> > On the xen side, this ends up in: hvmop_inject_msi -> hvm_inject_msi
> > -> vmsi_deliver. vmsi_deliver, as previously discussed, rejects the
> > delivery mode of 0x3.
> >
> > Is the above sequence of interactions the expected path for a HVM
> > guest trying to use a fully virtualized device/controller that uses
> > MSI in upstream qemu? If so, if a standard linux guest always
> > populates the value of 0x4300 in the MSI data register through
> > xen_hvm_setup_msi_irqs, how are MSI notifications from a device in
> > qemu supposed to work given the delivery type of 0x3 is indeed
> > reserved and bypass the the vmsi_deliver check?
> >
> I wanted to see whether the HVM guest can interact with the MSI
> virtualized controller properly without any of the Xen-specific code
> in the linux kernel kicking in (i.e. allowing the regular PCI/MSI code
> in linux to fire). So I rebuilt the kernel with CONFIG_XEN disabled
> such that pci_xen_hvm_init no longer sets x86_msi.*msi_irqs to xen
> specific routines like xen_hvm_setup_msi_irqs which is where the
> 0x4300 is getting populated. This seems to work properly. The MSI data
> register for the controller ends up getting a valid value like 0x4049,
> vmsi_deliver no longer complains, all MSI notifications are delivered
> in the expected way to the guest and the raw, file-backed disks
> attached to the controller showing up in fdisk -l.
> 
> My conclusion: the linux kernel's xen specific code, specifically
> routines like xen_hvm_setup_msi_irqs, need to be tweaked to work with
> fully virtualized qemu devices that use MSI. I will follow-up
> regarding that on LKML.

Thanks for your analysis of the problem, I think it is correct: Linux PV
on HVM is trying to setup an event channel delivery for the MSI as it
always does (therefore choosing 0x3 as delivery mode).
However emulated devices in QEMU don't support that.
To be honest emulated devices in QEMU didn't support MSIs at all until
very recently, so this is why we are seeing this issue only now.

Could you please try this Xen patch and let me know if it makes things
better?


diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index a90927a..f44f3b9 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -281,6 +281,31 @@ void hvm_inject_msi(struct domain *d, uint64_t addr, 
uint32_t data)
         >> MSI_DATA_TRIGGER_SHIFT;
     uint8_t vector = data & MSI_DATA_VECTOR_MASK;
 
+    if ( !vector )
+    {
+        int pirq = ((addr >> 32) & 0xffffff00) | ((addr >> 12) & 0xff);
+        if ( pirq > 0 )
+        {
+            struct pirq *info = pirq_info(d, pirq);
+
+            /* if it is the first time, allocate the pirq */
+            if (info->arch.hvm.emuirq == IRQ_UNBOUND)
+            {
+                spin_lock(&d->event_lock);
+                map_domain_emuirq_pirq(d, pirq, IRQ_MSI_EMU);
+                spin_unlock(&d->event_lock);
+            } else if (info->arch.hvm.emuirq != IRQ_MSI_EMU)
+            {
+                printk("%s: pirq %d does not correspond to an emulated MSI\n", 
__func__, pirq);
+                return;
+            }
+            send_guest_pirq(d, info);
+            return;
+        } else {
+            printk("%s: error getting pirq from MSI: pirq = %d\n", __func__, 
pirq);
+        }
+    }
+
     vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode);
 }
 
diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h
index 40e2245..066f64d 100644
--- a/xen/include/asm-x86/irq.h
+++ b/xen/include/asm-x86/irq.h
@@ -188,6 +188,7 @@ void cleanup_domain_irq_mapping(struct domain *);
 })
 #define IRQ_UNBOUND -1
 #define IRQ_PT -2
+#define IRQ_MSI_EMU -3
 
 bool_t cpu_has_pending_apic_eoi(void);
 
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.