[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)
On 01/09/2011 16:12, David Vrabel wrote: > On 01/09/11 15:23, Konrad Rzeszutek Wilk wrote: >> On Thu, Sep 01, 2011 at 08:42:52AM +0100, Ian Campbell wrote: >>> On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek Wilk wrote: >>>> On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel wrote: >>>>> On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote: >>>>>> So while I am still looking at the hypervisor code to figure out why >>>>>> it would give me [when trying to map a grant page]: >>>>>> >>>>>> (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000 >>>>> It is failing in guest_map_l1e() because the page for the vmalloc'd >>>>> virtual address PTEs is not present. >>>>> >>>>> The test that fails is: >>>>> >>>>> (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT >>>>> >>>>> I think this is because the GNTTABOP_map_grant_ref hypercall is done >>>>> when task->active_mm != &init_mm and alloc_vm_area() only adds PTEs into >>>>> init_mm so when Xen looks in the page tables it doesn't find the entries >>>>> because they're not there yet. >>>>> >>>>> Putting a call to vmalloc_sync_all() after create_vm_area() and before >>>>> the hypercall makes it work for me. Classic Xen kernels used to have >>>>> such a call. >>>> That sounds quite reasonable. >>> I was wondering why upstream was missing the vmalloc_sync_all() in >>> alloc_vm_area() since the out-of-tree kernels did have it and the >>> function was added by us. I found this: >>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a >>> >>> commit ef691947d8a3d479e67652312783aedcf629320a >>> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx> >>> Date: Wed Dec 1 15:45:48 2010 -0800 >>> >>> vmalloc: remove vmalloc_sync_all() from alloc_vm_area() >>> >>> There's no need for it: it will get faulted into the current pagetable >>> as needed. >>> >>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx> >>> >>> The flaw in the reasoning here is that you cannot take a kernel fault >>> while processing a hypercall, so hypercall arguments must have been >>> faulted in beforehand and that is what the sync_all was for. >>> >>> It's probably fair to say that the Xen specific caller should take care >>> of that Xen-specific requirement rather than pushing it into common >>> code. On the other hand Xen is the only user and creating a Xen specific >>> helper/wrapper seems a bit pointless. >> Perhaps then doing the vmalloc_sync_all() (or are more precise one: >> vmalloc_sync_one) should be employed in the netback code then? >> >> And obviously guarded by the CONFIG_HIGHMEM case? > Perhaps. But I think the correct thing to do initially is revert the > change and then look at possible improvements. Particularly as the fix > needs to be a backported to stable. > > David > I have implement a patch which does essentially this, i.e. calls vmalloc_sync_all() afer every alloc_vm_area() call (all 5 of them). Now my VMs start correctly, but I still get error messages in the xen dmesg output (attached). Is this expected? Anthony Attachment:
dmesg.log _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |