[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Regression with commit "x86/pv: Drop int80_bounce from struct pv_vcpu" f75b1a5247b3b311d3aa50de4c0e5f2d68085cb1



On 10/03/18 16:27, Andrew Cooper wrote:
> On 10/03/2018 16:14, Sander Eikelenboom wrote:
>> Hi Andrew,
>>
>> It seems commit "x86/pv: Drop int80_bounce from struct pv_vcpu" 
>> (f75b1a5247b3b311d3aa50de4c0e5f2d68085cb1) causes an issue on my machine, 
>> an AMD phenom X6.
>>
>> When trying to installing a new kernel package which runs the Debian
>> update-initramfs tools with xen-unstable which happened to be at commit 
>> c9bd8a73656d7435b1055ee8825823aee995993e as last commit the tool stalls
>> and i get this kernel splat:
>>
>> [  284.910674] BUG: unable to handle kernel NULL pointer dereference at 
>> 0000000000000000
>> [  284.919696] IP:           (null)
>> [  284.928315] PGD 0 P4D 0 
>> [  284.943343] Oops: 0010 [#1] SMP NOPTI
>> [  284.957008] Modules linked in:
>> [  284.965521] CPU: 5 PID: 24729 Comm: ld-linux.so.2 Not tainted 
>> 4.16.0-rc4-20180305-linus-pvhpatches-doflr+ #1
>> [  284.974154] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
>> V1.8B1 09/13/2010
>> [  284.983198] RIP: e030:          (null)
>> [  284.992006] RSP: e02b:ffffc90001497ed8 EFLAGS: 00010286
>> [  285.000612] RAX: 0000000000000000 RBX: ffff880074c64500 RCX: 
>> ffffffff82f8d1c0
>> [  285.009122] RDX: ffffffff82f8d1c0 RSI: 0000000020020002 RDI: 
>> ffffffff82f8d1c0
>> [  285.017598] RBP: ffff880074c64b7c R08: 0000000000000000 R09: 
>> 0000000000000000
>> [  285.025999] R10: 0000000000000000 R11: 0000000000000000 R12: 
>> ffffffff82f8d1c0
>> [  285.034400] R13: 0000000000000000 R14: 0000000000000000 R15: 
>> ffff880074c64b50
>> [  285.042718] FS:  00007f919fe2eb40(0000) GS:ffff88007d140000(0000) 
>> knlGS:0000000000000000
>> [  285.051001] CS:  e033 DS: 002b ES: 002b CR0: 0000000080050033
>> [  285.059458] CR2: 0000000000000000 CR3: 0000000002824000 CR4: 
>> 0000000000000660
>> [  285.067813] Call Trace:
>> [  285.075947]  ? task_work_run+0x85/0xa0
>> [  285.084025]  ? exit_to_usermode_loop+0x72/0x80
>> [  285.091980]  ? do_int80_syscall_32+0xfe/0x120
>> [  285.099896]  ? entry_INT80_compat+0x7f/0x90
>> [  285.107688]  ? fpu__drop+0x23/0x40
>> [  285.115362] Code:  Bad RIP value.
>> [  285.123072] RIP:           (null) RSP: ffffc90001497ed8
>> [  285.130714] CR2: 0000000000000000
>> [  285.138219] ---[ end trace 4d3317497f4ba022 ]---
>> [  285.145671] Fixing recursive fault but reboot is needed!
>>
>> After updating xen-unstable to the latest available commit 
>> 185413355fe331cbc926d48568838227234c9a20,
>> the tool doesn't stall anymore but i still get a kernel splat:
>>
>> [  198.594638] ------------[ cut here ]------------
>> [  198.594641] Invalid address limit on user-mode return
>> [  198.594651] WARNING: CPU: 1 PID: 75 at ./include/linux/syscalls.h:236 
>> do_int80_syscall_32+0xe5/0x120
>> [  198.594652] Modules linked in:
>> [  198.594655] CPU: 1 PID: 75 Comm: kworker/1:1 Not tainted 
>> 4.16.0-rc4-20180305-linus-pvhpatches-doflr+ #1
>> [  198.594656] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS 
>> V1.8B1 09/13/2010
>> [  198.594658] Workqueue: events free_work
>> [  198.594660] RIP: e030:do_int80_syscall_32+0xe5/0x120
>> [  198.594661] RSP: e02b:ffffc90000b8ff40 EFLAGS: 00010086
>> [  198.594662] RAX: 0000000000000029 RBX: ffffc90000b8ff58 RCX: 
>> ffffffff82868e38
>> [  198.594663] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 
>> 0000000000000001
>> [  198.594664] RBP: ffff880078623980 R08: 0000000000000dfa R09: 
>> 000000000000063b
>> [  198.594664] R10: 0000000000000000 R11: 000000000000063b R12: 
>> 0000000000000000
>> [  198.594665] R13: 0000000000000000 R14: 0000000000000000 R15: 
>> 0000000000000000
>> [  198.594672] FS:  00007fa252372b40(0000) GS:ffff88007d040000(0000) 
>> knlGS:0000000000000000
>> [  198.594673] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  198.594674] CR2: 00000000f7f303e4 CR3: 0000000002824000 CR4: 
>> 0000000000000660
>> [  198.594676] Call Trace:
>> [  198.594683]  entry_INT80_compat+0x7f/0x90
>> [  198.594685]  ? vunmap_page_range+0x2a0/0x340
>> [  198.594686] Code: 03 7f 48 8b 75 00 f7 c6 0e 38 00 00 75 2e 83 65 08 f9 
>> 5b 5d c3 e8 0c fb ff ff e9 53 ff ff ff 48 c7 c7 58 35 57 82 e8 ab 3e 0c 00 
>> <0f> 0b bf 09 00 00 00 48 89 ee e8 8c 00 0d 00 eb b8 48 89 df e8 
>> [  198.594706] ---[ end trace 90bcd2147bc825ef ]---
>>
>> After reverting commit f75b1a5247b3b311d3aa50de4c0e5f2d68085cb1 the issue is 
>> gone.
> :(
>
> This will be the issue which OSSTest is probably bisecting to as well. 
> It is quite odd to see a 64bit process using int80 as opposed to syscall.
>
> I'll see about double checking my assembly code, and will also try to
> identify why my unit tests haven't noticed an issue.

As a progress report, this is proving to be terrible bug to debug.

I've confirmed your findings.  However, my repro takes 10 minutes, and
I've failed to make it any faster.  It is more complicated than just
using 32bit userspace in a 64bit VM, and putting debugging in the
hypervisor makes the problem go away.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.