[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xc_gntshr_unmap problems (BUG(s) in xen-gntalloc?)



Hi,

On 29 Aug 2014, at 15:05, David Vrabel <david.vrabel@xxxxxxxxxx> wrote:

> On 29/08/14 13:40, Dave Scott wrote:
>> Hi,
>> 
>> On 28 Aug 2014, at 14:50, David Vrabel <david.vrabel@xxxxxxxxxx> wrote:
>> 
>>> On 27/08/14 22:33, Dave Scott wrote:
>>>> I notice xc_gntshr_munmap for Linux simply calls 'munmap'
>>>> 
>>>> static int linux_gntshr_munmap(xc_gntshr *xcg, xc_osdep_handle h,
>>>>                              void *start_address, uint32_t count)
>>>> {
>>>>   return munmap(start_address, count);
>>>> }
>>> 
>>> munmap() needs a byte length, not a page count.
>>> 
>>> When using xc_gntshr_munmap() with multiple pages this results in none
>>> of the grefs being deleted (unshared and freed) since a mapping to some
>>> of the grefs in the set remain.
>> 
>> Aha, good spot. I worked around this in my test program by calling 
>> xc_gntshr_munmap with (count * 4096) and it’s no longer leaking.
>> 
>>> This doesn't appear to explain why they're not deleted by the device is
>>> closed.
>> 
>> After rebuilding xen-gntalloc with your change I couldn’t reproduce this. I 
>> tried all combinations of
>> 
>>  * map 1 or 2 pages
>>  * unmap 0, 1 or 2 pages
>> 
>> and, although some of the iterations did run out of grant references (as 
>> expected), nothing seemed to be leaked over a close.
> 
> Ok,  Are you going to submit a libxc patch fixing the munmap() length?
> 
>>>> -- so I guess the problem is with the xen-gntalloc driver?
>>>> 
>>>> If I share single pages at a time then it triggers a BUG:
>>>> $ sudo ./test-gnt 1
>>>> [  148.564281] BUG: unable to handle kernel paging request at 
>>>> ffffc908001bff20
>>>> [  148.564299] IP: [<ffffffff813acf93>] 
>>>> gnttab_query_foreign_access+0x13/0x20
>>>> [  148.564312] PGD 3d520067 PUD 0
>>>> [  148.564317] Oops: 0000 [#1] SMP
>>>> [  148.564322] CPU 0
>>>> [  148.564325] Modules linked in: xenfs xen_evtchn xen_gntalloc xen_gntdev 
>>>> lp parport
>>>> [  148.564337]
>>>> [  148.564340] Pid: 897, comm: test-gnt Not tainted 3.2.0-67-generic 
>>>> #101-Ubuntu
>>>> [  148.564348] RIP: e030:[<ffffffff813acf93>]  [<ffffffff813acf93>] 
>>>> gnttab_query_foreign_access+0x13/0x20
>>>> [  148.564356] RSP: e02b:ffff88003c655da0  EFLAGS: 00010286
>>>> [  148.564360] RAX: ffffc900001c0000 RBX: ffff88003cdb9e40 RCX: 
>>>> 0000000000000000
>>>> [  148.564365] RDX: 0000000000000000 RSI: 000000000007026e RDI: 
>>>> 00000000ffffffe4
>>>> [  148.564371] RBP: ffff88003c655dd8 R08: 0000000000000000 R09: 
>>>> 000000000003725f
>>>> [  148.564376] R10: ffffea0000ef3680 R11: 0000000000000000 R12: 
>>>> ffff88003cdb9e40
>>>> [  148.564381] R13: 0000000000000000 R14: ffff88003c655e80 R15: 
>>>> 0000000000000000
>>>> [  148.564389] FS:  00007ffe79406740(0000) GS:ffff88003fc00000(0000) 
>>>> knlGS:0000000000000000
>>>> [  148.564394] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>> [  148.564400] CR2: ffffc908001bff20 CR3: 000000003cdc6000 CR4: 
>>>> 0000000000000660
>>>> [  148.564406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>>>> 0000000000000000
>>>> [  148.564412] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
>>>> 0000000000000400
>>>> [  148.564418] Process test-gnt (pid: 897, threadinfo ffff88003c654000, 
>>>> task ffff88003cdd4500)
>>>> [  148.564423] Stack:
>>>> [  148.564426]  ffffffffa000d1a5 ffff88003c655dd8 ffffffff813adbdb 
>>>> 00000000ffffffe4
>>>> [  148.564435]  0000000000000000 00000000ffffffe4 ffff88003cdb9e40 
>>>> ffff88003c655e68
>>>> [  148.564443]  ffffffffa000d848 ffff88003cc47790 ffff88003c5a8dc0 
>>>> ffff8800041aeba8
>>>> [  148.564452] Call Trace:
>>>> [  148.564459]  [<ffffffffa000d1a5>] ? __del_gref+0x105/0x150 
>>>> [xen_gntalloc]
>>>> [  148.564465]  [<ffffffff813adbdb>] ? 
>>>> gnttab_grant_foreign_access+0x2b/0x80
>>>> [  148.564471]  [<ffffffffa000d848>] add_grefs+0x1c8/0x2b0 [xen_gntalloc]
>>>> [  148.564478]  [<ffffffffa000da28>] gntalloc_ioctl_alloc+0xf8/0x160 
>>>> [xen_gntalloc]
>>>> [  148.564485]  [<ffffffffa000dae0>] gntalloc_ioctl+0x50/0x64 
>>>> [xen_gntalloc]
>>>> [  148.564492]  [<ffffffff8118d45a>] do_vfs_ioctl+0x8a/0x340
>>>> [  148.564498]  [<ffffffff811456b3>] ? do_munmap+0x1f3/0x2f0
>>>> [  148.564504]  [<ffffffff8118d7a1>] sys_ioctl+0x91/0xa0
>>>> [  148.564510]  [<ffffffff8166bd42>] system_call_fastpath+0x16/0x1b
>>>> [  148.564515] Code: f8 48 8b 15 98 89 b6 00 66 89 04 fa 5d c3 66 2e 0f 1f 
>>>> 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 05 78 89 b6 00 89 ff 5d 
>>>> <0f> b7 04 f8 83 e0 18 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66
>>>> [  148.564577] RIP  [<ffffffff813acf93>] 
>>>> gnttab_query_foreign_access+0x13/0x20
>>>> [  148.564583]  RSP <ffff88003c655da0>
>>>> [  148.564586] CR2: ffffc908001bff20
>>>> [  148.564591] ---[ end trace 57b3a513f0d79bd6 ]---
>>> 
>>> Does this patch fix the oops?
>> 
>> Yes, I’ve left my test case running for several hours with no sign of 
>> trouble.
> 
> Thanks. I'll take that as a
> 
> Tested-by: Dave Scott <david.scott@xxxxxxxxxx>

I just realised that my tests weren’t as clear-cut as I hoped. When I rebuilt 
my ubuntu kernel I accidentally got a free upgrade from 3.2 to 3.13, so what I 
actually tested was:

3.2: kernel oops triggered easily
3.13 plus your patch: seems solid

I could go back and test 3.13 minus your patch if that would be helpful.

Cheers,
Dave


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.