Xen project Mailing List

[Xen-devel] Race condition in gntdev driver

To: Daniel De Graaf <dgdegra@xxxxxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx>

From: Roderick Colenbrander <thunderbird2k@xxxxxxxxx>

Date: Thu, 26 Jan 2012 01:49:32 +0000

Delivery-date: Thu, 26 Jan 2012 01:50:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi Daniel and Konrad, In an email thread at the end of last year ('Questions about GPLPV stability tests'), I mentioned a certain bug in the gntdev driver. I think I found the bug and would like your opinion. Since Daniel did a lot of work on gntdev recently, I think he is most familiar with the code. The systems on which I see this bug run Xen 4.1.2 and Linux 2.6.32.48 (pvops), but the same bug should be in Linux 3.x since the code is similar. Let me summarize the problem. At some point in time a DomU is shutdown. Apparently shutdown takes too long according to Xend, causing it to SIGKILL qemu-dm: [2012-01-16 14:54:34 3419] INFO (XendDomainInfo:2078) Domain has shutdown: name=win7-1 id=7581 reason=poweroff. [2012-01-16 14:54:34 3419] DEBUG (XendDomainInfo:3071) XendDomainInfo.destroy: domid=7581 [2012-01-16 14:54:36 3419] DEBUG (XendDomainInfo:2401) Destroying device model [2012-01-16 14:54:46 3419] WARNING (image:649) DeviceModel 31742 took more than 10s to terminate: sending SIGKILL This triggers the following Oops: [533804.785589] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [533804.793913] IP: [<ffffffff814bdba9>] _spin_lock+0x5/0x15 [533804.799556] PGD 0 [533804.801896] Oops: 0002 [#1] SMP [533804.805448] last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:03:00.0/class [533804.813736] CPU 0 [533804.816066] Modules linked in: dm_snapshot dm_mod ipmi_si ipmi_devintf [533804.822914] Pid: 21632, comm: qemu-dm Not tainted 2.6.32.48 #1 X8STi [533804.829595] RIP: e030:[<ffffffff814bdba9>] [<ffffffff814bdba9>] _spin_lock+0x5/0x15 [533804.837873] RSP: e02b:ffff88005f187c50 EFLAGS: 00010202 [533804.843513] RAX: 0000000000000100 RBX: ffff88007d6923c0 RCX: 0000000000000003 [533804.851192] RDX: ffff88007deb3180 RSI: ffff88007d6923c0 RDI: 0000000000000008 [533804.858871] RBP: ffff88007e260128 R08: 0000000000000000 R09: 0000000000000000 [533804.866552] R10: ffff88007121eb40 R11: ffffffff811b862b R12: ffff88007fac1e40 [533804.874541] R13: ffff88007e3c3340 R14: ffff88007e3c3340 R15: ffff88007fdfbc00 [533804.882243] FS: 00007f5cdcefe6f0(0000) GS:ffff880028038000(0000) knlGS:0000000000000000 [533804.890874] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [533804.896948] CR2: 0000000000000008 CR3: 0000000001001000 CR4: 0000000000002660 [533804.904627] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [533804.912306] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [533804.919985] Process qemu-dm (pid: 21632, threadinfo ffff88005f186000, task ffff880074f4e270) [533804.928971] Stack: [533804.931312] ffffffff810a9ad0 ffff88007deb3180 ffff88007e260100 ffff88007e260100 [533804.938762] <0> ffffffff8124222d 0000000000000008 0000000000000008 ffff88007deb3180 [533804.946900] <0> ffffffff810b245e ffff88007fac1e40 ffff88007deb3180 0000000000000000 [533804.955465] Call Trace: [533804.958244] [<ffffffff810a9ad0>] ? mmu_notifier_unregister+0x27/0xa5 [533804.965019] [<ffffffff8124222d>] ? gntdev_release+0xc3/0xd1 [533804.971007] [<ffffffff810b245e>] ? __fput+0x100/0x1af [533804.976477] [<ffffffff810af998>] ? filp_close+0x5b/0x62 [533804.982119] [<ffffffff81042989>] ? put_files_struct+0x64/0xc1 [533804.988280] [<ffffffff810441f0>] ? do_exit+0x209/0x68d [533804.993836] [<ffffffff810441f8>] ? do_exit+0x211/0x68d [533804.999390] [<ffffffff810446e9>] ? do_group_exit+0x75/0x9c [533805.005294] [<ffffffff8104cf1d>] ? get_signal_to_deliver+0x30d/0x338 [533805.012063] [<ffffffff81010f7a>] ? do_notify_resume+0x8a/0x6b1 [533805.018310] [<ffffffff810bdb3a>] ? poll_select_copy_remaining+0xd0/0xf3 [533805.025339] [<ffffffff81011c8e>] ? int_signal+0x12/0x17 [533805.030980] Code: 00 00 00 01 74 05 e8 67 1c d2 ff 48 89 d0 5e c3 ff 14 25 20 2d 6a 81 f0 81 2f 00 00 00 01 74 05 e8 4d 1c d2 ff c3 b8 00 01 00 00 <f0> 66 0f c1 07 38 e0 74 06 f3 90 8a 07 eb f6 c3 f0 81 2f 00 00 [533805.050454] RIP [<ffffffff814bdba9>] _spin_lock+0x5/0x15 [533805.056182] RSP <ffff88005f187c50> [533805.059997] CR2: 0000000000000008 [533805.063638] ---[ end trace 85ee7cbec9ce72ac ]--- [533805.068584] Fixing recursive fault but reboot is needed! Below I attached the in my opinion relevant part of the gntdev driver: static int gntdev_open(struct inode *inode, struct file *flip) { ... ... priv->mm = get_task_mm(current); if (!priv->mm) { kfree(priv); return -ENOMEM; } priv->mn.ops = &gntdev_mmu_ops; mmu_notifier_register(&priv->mn, priv->mm); mmput(priv->mm); flip->private_data = priv; ... } static int gntdev_release(struct inode *inode, struct file *flip) { struct gntdev_priv *priv = flip->private_data; ... mmu_notifier_unregister(&priv->mn, priv->mm); kfree(priv); return 0; } I think the bug is due to gntdev not handling reference counting (mm->mm_users) in combination with a race condition. The function gntdev_open calls get_task_mm (which increases mm_users) but straight after storing 'mm' in 'priv' it performs a mmput which again decreases the reference count. Since mmu_notifier_register also increases the reference count, the count is definitely positive after gntdev_open. I suspect that due to a race condition (after 10 seconds, xend performed a SIGKILL) the mm structure already got freed somehow. This causes a bug in mmu_notifier_unregister. If this is correct then the the 'mmput' line should be moved to gntdev_release and placed after mmu_notifier_unregister. What do you think? If this is correct, I will send a patch for this. Regards, Roderick Colenbrander _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.