[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Kernel bug from 3.0 (was phy disks and vifs timing out in DomU)

On 09/01/2011 12:21 PM, Ian Campbell wrote:
> On Thu, 2011-09-01 at 18:32 +0100, Jeremy Fitzhardinge wrote:
>> On 09/01/2011 12:42 AM, Ian Campbell wrote:
>>> On Wed, 2011-08-31 at 18:07 +0100, Konrad Rzeszutek Wilk wrote:
>>>> On Wed, Aug 31, 2011 at 05:58:43PM +0100, David Vrabel wrote:
>>>>> On 26/08/11 15:44, Konrad Rzeszutek Wilk wrote:
>>>>>> So while I am still looking at the hypervisor code to figure out why
>>>>>> it would give me [when trying to map a grant page]:
>>>>>> (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
>>>>> It is failing in guest_map_l1e() because the page for the vmalloc'd
>>>>> virtual address PTEs is not present.
>>>>> The test that fails is:
>>>>> (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT
>>>>> I think this is because the GNTTABOP_map_grant_ref hypercall is done
>>>>> when task->active_mm != &init_mm and alloc_vm_area() only adds PTEs into
>>>>> init_mm so when Xen looks in the page tables it doesn't find the entries
>>>>> because they're not there yet.
>>>>> Putting a call to vmalloc_sync_all() after create_vm_area() and before
>>>>> the hypercall makes it work for me.  Classic Xen kernels used to have
>>>>> such a call.
>>>> That sounds quite reasonable.
>>> I was wondering why upstream was missing the vmalloc_sync_all() in
>>> alloc_vm_area() since the out-of-tree kernels did have it and the
>>> function was added by us. I found this:
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ef691947d8a3d479e67652312783aedcf629320a
>>> commit ef691947d8a3d479e67652312783aedcf629320a
>>> Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
>>> Date:   Wed Dec 1 15:45:48 2010 -0800
>>>     vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
>>>     There's no need for it: it will get faulted into the current pagetable
>>>     as needed.
>>>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx>
>>> The flaw in the reasoning here is that you cannot take a kernel fault
>>> while processing a hypercall, so hypercall arguments must have been
>>> faulted in beforehand and that is what the sync_all was for.
>> That's a good point.  (Maybe Xen should have generated pagefaults when
>> hypercall arg pointers are bad...)
> I think it would be a bit tricky to do in practice, you'd either have to
> support recursive hypercalls in the middle of other hypercalls (because
> the page fault handler is surely going to want to do some) or proper
> hypercall restart (so you can fully return to guest context to handle
> the fault then retry) or something along those and complexifying up the
> hypervisor one way or another. Probably not impossible if you were
> building something form the ground up, but not trivial.

Well, Xen already has the continuation machinery for dealing with
hypercall restart, so that could be reused.  And accesses to guest
memory are already special events which must be checked so that EFAULT
can be returned.  If, rather than failing with EFAULT Xen set up a
pagefault exception for the guest CPU with the return set up to retry
the hypercall, it should all work...

Of course, if the guest isn't expecting that - or its buggy - then it
could end up in an infinite loop.  But maybe a flag (set a high bit in
the hypercall number?), or a feature, or something?  Might be worthwhile
if it saves guests having to do something expensive (like a
vmalloc_sync_all), even if they have to also deal with old hypervisors.

>> There's already a wrapper: xen_alloc_vm_area(), which is just a
>> #define.  But we could easily add a sync_all to it (and use it in
>> netback, like we do in grant-table and xenbus).
> OOI what was the wrapper for originally?

Not sure; I brought it over from 2.6.18-xen.

BTW, vmalloc_sync_all() is much hated, and is slated for removal at some
point - there are definitely target sights on it.  So we should think
about not needing it.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.