[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] Xen vm kernel crash in get_free_entries.



Hello,

Let me bring some new life to this discussion.

I've investigated a bit and found another way to make kernels starting from 3.8.x to boot on the VMs with platform device_id 0002. Reverting of xen-grant-table-correctly-initialize-grant-table-version-1 patch is not necessary.

We can simply modify struct pci_device_id platform_pci_tbl[] (in drivers/xen/platform-pci.c) to respect 0002 and 0000 device ids. That makes the kernel (3.8.x and 3.11.6) to boot correctly, disks and network are also recognized.

IMO, there is no need to add new fields with device id 0002 and device id 0000 to platform_pci_tbl[] , we can modify the existing one to use PCI_ANY_ID instead of PCI_DEVICE_ID_XEN_PLATFORM (which is 0001), so if we have PCI_VENDOR_ID_XEN there is no need to pay attention on device id.

So the patch is more than simple. See attached. I've tested the resulted kernel in my environment (with device ids 0002, 0001 and 0000) and it seems to work well.


--
Marina

On 10/21/2013 02:55 PM, Matt Wilson wrote:
On Sat, Oct 19, 2013 at 01:58:50PM +0200, Sander Eikelenboom wrote:
Saturday, October 19, 2013, 1:03:17 PM, you wrote:

On Sat, 2013-10-19 at 14:51 +0400, Astarta wrote:
On 10/19/2013 03:14 AM, Sander Eikelenboom wrote:
makes a HVM guest (qemu-xen-traditional) with xen_platform_pci=0 boot again 
using xl, haven't tested it with xend.

Great catch!
I also confirm that 3.11.5 kernel boots just fine after reverting of
'correctly initialize grant table version 1' patch.
This could just be down to that patch adding some BUG_ONs to catch bad
things going on, e.g. the one in gnttab_expand which I think is being
hit here.
I have a feeling that it is still wrong (but just more benign) to be
hitting that call chain in a configuration where there is no platform
device driver running. IOW reverting that patch removes the obvious
symptom (blowing up) but not the root cause, i.e. the patch is doing its
job.
That was my suspicion too, but at least it seems like some starting point
of further debugging.
(and indication of the kernels affected since this commit went to stable as 
well)

Since i was still seeing the "Booting PV enabled guest on Xen HVM" is was 
wondering
what is supposed to happen when there are some combinations ....
This is the enlightenment code noticing that it's running in a HVM
guest under Xen via the hypervisor cpuid leaf (cpuid leaf
0x40000000).

xen HVM xen_platform_pci=0 + guest kernel without PV guest support and without 
xen pv drivers (net + block)
This should work.

xen HVM xen_platform_pci=0 + guest kernel with PV guest support but without xen 
pv drivers (net + block)
This should work.

xen HVM xen_platform_pci=0 + guest kernel with PV guest support and with xen pv 
drivers (net + block)
-- This is the configuration that hits the bug described here.
I don't see how this can be expected to work - the PV net and block
devices need the facilities that are initialized by the Xen platform
PCI device to operate. Of course it shouldn't crash either, it should
just use emulated devices instead of xen-netfront/xen-blkfront.

xen HVM xen_platform_pci=1 + guest kernel without PV guest support and without 
xen pv drivers (net + block)
This should work.

xen HVM xen_platform_pci=1 + guest kernel with PV guest support and without xen 
pv drivers (net + block)
This should work.

xen HVM xen_platform_pci=1 + guest kernel with PV guest support and with xen pv 
drivers (net + block)
This should work.

Booting a guest kernel with PV support as HVM but without using PV doesn't seem 
possible with a .cfg option ?
(yes it's a hypothetical option (performance wise), as is running with a guest 
kernel which supports PV drivers,
  but not using them with xen_platform_pci=0 .. but it is useful for debugging )
AFAICT the expected behavior would be to for the guest kernel to use
basic enlightenment for CPU operations (hotplug, timers) but no PV IO
support (net + block). But perhaps I'm missing something since you
theoretically don't need the PCI device if you have event channel
callback support in the guest kernel and sufficient support in the
hypervisor.

--msw


--
Ð ÑÐÐÐÐÐÐÐÐ,
Astarta
ÐÐÐÐÐÐÑÑÑÐÑÐÑ ÐÐÑÑÐÐ "ÐÑÑÑÐÐÑÐ ÐÑÐ"
http://rat.ru/forum/index.php

âThe Linux philosophy is 'Laugh in the face of danger'.
Oops. Wrong One. 'Do it yourself'. Yes, that's it.â
(c) Linus Torvalds.

Attachment: xen-platform_id.patch
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.