[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Crashing kernel with dom0/libxc gnttab/gntshr



On 07/30/2013 12:58 PM, David Vrabel wrote:
[...]

[  902.729307] BUG: Bad page map in process vchan-node1  pte:12bfff167
pmd:b9b5c067
[  902.729312] page:ffffea0004afffc0 count:1 mapcount:-1 mapping:
    (null) index:0xffffffffffffffff

I think this is the test for page_mapcount(page) < 0 in zap_pte_range().
  This has looked up the page using the PTE it is trying to clear.  Has
it found the correct page?  Since the MFN is currently mapped into the
same domain, has the m2p_override stuff confused the look up and it is
checking the grantee page not the granter?

David

I think something like this is happening, since while reproducing this
on my test system, some linked list corruption was found that I believe
to be the cause of this problem. The gnttab_map_refs function on PV uses
m2p_add_override on the page, which threads page->lru to an
m2p_overrides list. However, something else is using page->lru during
the use of gntdev, as shown by the following debug patch:

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 3c8803f..198e57e 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -294,6 +294,11 @@ static int map_grant_pages(struct grant_map *map)
        if (err)
                return err;
+ printk("map page0 lru: %p prev=%p:%p next=%p:%p\n",
+               &map->pages[0]->lru,
+               map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
+               map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
+
        for (i = 0; i < map->count; i++) {
                if (map->map_ops[i].status)
                        err = -EINVAL;
@@ -320,6 +325,10 @@ static int __unmap_grant_pages(struct grant_map *map, int 
offset, int pages)
                }
        }
+ printk("unmap page0 lru: %p prev=%p:%p next=%p:%p\n",
+               &map->pages[0]->lru,
+               map->pages[0]->lru.prev, map->pages[0]->lru.prev->next,
+               map->pages[0]->lru.next, map->pages[0]->lru.next->prev);
        err = gnttab_unmap_refs(map->unmap_ops + offset,
                        use_ptemod ? map->kmap_ops + offset : NULL, map->pages 
+ offset,
                        pages);

Output:
[   88.610644] map page0 lru: ffffea0001dee160 
prev=ffffffff82f2d510:ffffea0001dee160 next=ffffffff82f2d510:ffffea0001dee160
[   88.611515] BUG: Bad page map in process a.out  pte:8000000077b85167 
pmd:2541a067
[   88.611525] page:ffffea0001dee140 count:1 mapcount:-1 mapping:          
(null) index:0xffffffffffffffff
[   88.611532] page flags: 0x1000000000000814(referenced|dirty|private)
[   88.611541] addr:00007f1adaef3000 vm_flags:140400fb anon_vma:          
(null) mapping:ffff8800692974a0 index:0
[   88.611547] vma->vm_ops->fault:           (null)
[   88.611555] vma->vm_file->f_op->mmap: gntalloc_mmap+0x0/0x1d0
[...backtrace cropped...]
[   88.614301] unmap page0 lru: ffffea0001dee160 
prev=ffff8800254c9d08:ffff88001ea0b120 next=ffff8800254c9d08:ffff88001ea0b938

The initial map is a linked list with only that element, so the address
0xffffffff82f2d510 is the m2p_overrides entry. This means the page being
found by zap_pte_range is not a valid struct page.

The struct page* being used by the gntalloc device was 0xffffea0000952740,
for reference; it's not a direct collision between the page used by the
gntdev and gntalloc devices.

Not sure what the best fix is for this at the moment.

--
Daniel De Graaf
National Security Agency

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.